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preface 


fjlfni" wv + 8 a r 11 gr0UP 0f G ° sset ’ s relatives a “ d friends decided to examine 

papas' wliich he re - issue in a single volume of all the scientific 

of^Studtnt Th P r tWe6n 1907 ^ 1937 Under the Pseudonym 
ot student The project was a happy one, for a unity of purpose runs 

through the whole of his contributions. In nearly every case the oriTn of a 

hkTr Ms m co nr blem OT r M T Whi ° h reqUir6d S ° Iuti0n “ connec tion with 
ms or his colleagues work at the Dublin brewery and, since the brewer is 

r -:t m with chemistry a " d t^zzz 

ih , contributions was the application of statistical method in 

the research and route* problems of both industry and agriculture. 

the praS 7 ^ du ' ectness of his methods of approach, his clear grasp of 

applJd to th TT f appreoiation of the limitation of mathematics when 

hellded as t I'T? 1106 ’ ® ^ Statistical techni( l« e should 

be regarded as an aid to but not a substitute for common sense, have given to 

Whi ° h WiU ,aSt> alth ° Ugh the Precise “^he¬ 
matics! methods by which he derived his results may have been superseded 
He was a pioneer worker in a field which, during his later years ZTpMlv 

hiZbtS "a f rf ^ intimat6l f r6lated t0 the historical development of 

hk nroofs r ’ n m6YltMe that h e made certain mistakes and that 

his proofs were not all correct, although it is surprising how right he was in 
general and how often he “got there fir^t” h\r thTiqi- ^ 

guegs> gortnerenrst by what was sometimes an inspired 

nr,^ 61 ^!! 11686 0ircumstances have regarded our editorial role as a minor 
one; we have not attempted to point out every place where later work may 

proofs n folwe do^T * ^ ° f “ ° r simpIified ^ mathematical 

W r ’ no ex P ect tile reader to regard this volume as a text book 

Where numerical or algebraic slips have been discovered, some of them possibly 
misprints, we have corrected these without comment unless theXlkon 
appeared seriously to modify the argument. Such few editorial comments as 

fofioweTbv th TX ^ “ f °° tn0teS 6nCl0S6d “ Square buckets a “ d 

contemporarv edX ’ J" tW ° instanoes the ori ginaI paper contained 

7 KarI Pearson> and here we w inserted 

references in the t T ? ^ dlstmctlon clear - T ° assist the reader, Student’s 
references m the text to his own contributions have been followed bv the 

num er with which the article is headed in this volume, e.g. [2 p 291 The 

mam papers have been reprinted in the order of date of pubiic’atL ihile a 
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Preface 

V 111 J 

few shorter miscellaneous contributions are added in a separate sect,on at 

th As' 1 )!'Foreword, an appreciation by Lannce McMullen has been included 
which with slight modifications, is the article headed “Student as a Man 
tiiat appeared in Biometriha, xxx (1.938), pp. 205-10. 

Student’s personality and statistical work reference may be *£££ 

by E. A. Fisher, Annals of Euyenics, ix (1939) pp. 1-9, E. S. P™ 
Biometrika, xxx (1938), pp. 210-50; and to contnbutmns by H H ^ W. and 
E M E. in the Journal of the Royal Statistical Society , ( )> PP , 

The papers have been collected from a number of sources and we must thank 
the following authorities for freely granting permission for * eir “ 

present volume: the Editor of the Annals of Eugemcs ; Messrs Bailliere, Tmda 
Cox, the Publishers of BaUliere's Encycbpcedia of Scientific^gncvMvre h ■ 
Trustees of Biometrika-, the Editor of the Eugemcs Review- the Editor of the 
Journal of Agricultural Science-, the Editor of the Journal of the American Society 
triglLy -, the Director of Metron; the Proprietors of Nature; the Council of 

^mTiver^ateM'to Dr E. C. Geary and Mr E. Somerfield for assistance 

^ Should like to thank Mrs W. S. Gosset and her brother, Mr U. 
Phillpotts for giving us this opportunity as joint editors of helping 00 
2S . t„«h„ to impitoUon i„ ,h. p.,t ” ”,d 

much On their behalf we must also thank the Trustees of Biometrika for ac- 
^ttg responsibility for publication and Mr Walter Lewis of the Cambridge 
University Press for invaluable help in the arrangement and printing of the 

volume. B s pbarsoN 

.TrtTTNT WTSHART 


September 1942 




FOREWORD 


William Sealy Cosset was born in 1876—the eldest of four sons and a 
daughter. His father was Colonel Frederic Cosset, R.E., who married Agnes 
Sealy Vidal in 1875. The Cossets were an old Huguenot family who left France 
at the Revocation of the Edict of Nantes. 

He was a Scholar of Winchester, and wishing to join the Royal Engineers 
passed into Woolwich but was rejected in the subsequent medical examination 
(again in 1916 he wished to volunteer for the Army but was rejected for short 
sight). He then went as a Scholar to New College, Oxford, where he obtained 
First Classes in Mathematical Moderations and in Natural Science. In the 
autumn of 1899 he went as a Brewer to Messrs Cuinness in Dublin. 

In 1906 he married Marjory Surtees Phillpotts, youngest daughter of the late 
Headmaster of Bedford School. She was at about that time Captain of the 
English Ladies Hockey Team, and subsequently she played for, and captained, 
the Irish Team. They had one son and two daughters. 

He died on 16 October 1937 and was survived by both his parents, his wife 
and children and one grandson. 

It is not known exactly how or when “Student’s” interest in statistics was 
first aroused, but at this period scientific methods and laboratory determinations 
were beginning to be seriously applied to brewing, and it is obvious that some 
knowledge of error functions would be necessary. A number of university men 
with science degrees had been taken on, and it is probable that “Student”, who 
was the most mathematical of them, was appealed to by the others with various 
questions and so began to study the subject. It is known that he could calculate 
a probable error in 1903. The circumstances of brewing work, with its variable 
materials and susceptibility to temperature change and necessarily short series 
of experiments, are all such as to show up most rapidly the limitations of large 
sample theory and emphasize the necessity for a correct method of treating 
small samples. It was thus no accident, but the circumstances of his work, that 
directed “Student’s” attention to this problem, and so led to his discovery of 
the distribution of the sample standard deviation, which gave rise to what in 
its modern form is known as the £-test. For a long time after its discovery and 
publication the use of this test hardly spread outside Guinness’s brewery, where 
it has been very extensively used ever since. In the Biometric school at 
University College the problems investigated were almost all concerned with 
much larger samples than those in which “studentizing”, as it was sometimes 
called, made any difference. Nevertheless, although their lines of research 
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diverged somewhat rapidly, the close statistical contact and personal friendship 
between Karl Pearson and ££ Student ”, which began during his year at University 
College, were only terminated by death. 

The purpose of this note is not however to give an account of ££ Student’s” 
statistical work, but to try to give a more general impression of the man himself. 
Although his public reputation was entirely as a statistician, and he was 
acknowledged to be one of the leading investigators in that subject, his time was 
never wholly and rarely even mainly occupied with statistical matters. For one 
who saw enough of him to know roughly how his time was spent both at work 
and at home, it was very difficult to understand how he managed to get so 
much activity into the day. At work he got through an enormous amount of 
the ordinary routine of the brewery, as well as his statistics. Until 1922 he had 
no regular statistical assistant, and did all the statistics and most of the 
arithmetic himself; later there was a definite department, of which he was in 
charge till 1934, but throughout he did a great deal of arithmetic and spade¬ 
work himself. It might be supposed from the amount he did in the time that 
he was unusually good at arithmetic and the arrangement of work, such, 
however, was not the case, for his arithmetic frequently contained minor errors. 
Jn one of his obituary notices a tendency to do work on the backs of envelopes 
in trains was mentioned, but this tendency was not confined to trains; even in 
his office much work was done on random scraps of paper. He also had a great 
dislike of the tabulation of results, and preferred to do everything from first 
principles whenever possible. This preference led in certain instances to waste 
of time in routine work, but was of assistance in maintaining that flexibility and 
speed of attack on new problems which was so characteristic of him. An actual 
example would need too much explanation of relevant circumstances, but I can 
vouch for the analogical truth of the following. If a body performs simple 
harmonic motion with acceleration ft per unit displacement, it may readily be 
shown that the period of a complete oscillation is 27r/^. Hence, in the case of 
a simple pendulum t=27T^(llg) and Z = gr^ 2 /47r 2 , where l is the length of the 
pendulum and g the acceleration due to gravity. If it were necessary to calculate 
the lengths of pendulum corresponding to different periods as a routine matter, 
most people would evaluate g/^n 2 for their locality and always multiply t 2 by 
this numerical constant, which would be about 24- 85. £ £ Student ” would probably 
have started from 2nl^Jpt every time. If therefore he had suddenly wanted to 
calculate the period of oscillation of a weight on a stretched spring he could 
have done it, whereas the man who only remembered that Z = 24-85£ 2 for a 
pendulum would be unable to tackle the problem without much more pre¬ 
liminary work. 

His method was, of course, not necessarily the most suitable for others not 
aspiring to the same degree of versatility. Perhaps it is not altogether fanciful 
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to compare the two methods with the organic evolution of, say, the human hand, 
the most versatile object known, and the construction of some highly efficient 
but absolutely specialized piece of machinery. I do not mean to imply that he 
gave this explanation, or was even altogether conscious of it. When he handed 
over to me a routine calculation which he had done for many years, I was 
astonished to find that he had written out every week an almost unvarying form 
of words with different figures. To my question, “Why ever don’t you get a 
printed form?” he did not reply, “Doing it from first principles every time 
preserves mental flexibility”. He would have considered such a remark un¬ 
bearably pompous. He said, “Because I’m too lazy”, to which I replied 
“Well, I’m too lazy not to.” 

To many in the statistical world “Student” was regarded as a statistical 
adviser to Guinness’s brewery; to others he appeared to be a brewer devoting 
his spare time to statistics. I have tried to show that though there is some 
truth in both of these ideas they miss the central point, which was the intimate 
connexion between his statistical research and the practical problems on which 
he was engaged. I can imagine that many think it wasteful that a man of his 
undoubted genius should have been engaged in industry, yet I am sure that it is 
just that association with immediate practical problems which gives “Student’s ” 
work its unique character and importance relative to its small volume. On at 
least one occasion he was offered an academic appointment, but it is almost 
certain that he would not have been a successful lecturer, though perhaps a good 
individual teacher; nor is it likely that his research work would have flourished 
in more academic circumstances; his mind worked in a different way. 

The work in connexion with barley breeding carried out by the Department 
of Agriculture in Ireland, in which Messrs Guinness took a prominent part, 
enabled “Student” to get that first-hand experience of yield trials and agricul¬ 
tural experiments generally which contributed so largely to his great knowledge 
of the subject. He did not merely sit in his office and calculate the results, but 
discussed all the details and difficulties with the Department officials, and went 
round all the experiments before harvest, when a “grand tour” is annually 
carried out by the Department, the brewery, and sometimes statisticians or 
others interested from England or abroad. As well as the work carried out at the 
actual cereal station near Cork, three or four varieties of barley are grown in 
| or 1 acre plots at ten farms representing all the principal barley-growing 
districts of Ireland, so a visit to all of them entails a fairly comprehensive 
inspection of the crops. 

Student took a great deal of interest in this work from the beginning and 
correspondence shows that he discussed the results of these tests with Karl 

Pearson at great length when he went to study with him at University College 
in 1906. J 6 
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In the last ten years or so of his time in Ireland he played a leading part in 
these investigations, and thus had a perhaps unique opportunity of following 
experimental varieties from sowing through growing and harvest to malting 
and brewing results, and also of carrying out or supervising all the relevant 
mathematical work. At one time he also made some barley crosses in his own 
garden, and accelerated their multiplication by having one generation grown 
in New Zealand during our winter. These crosses were known as Student I and 
II, and have now been discarded as failures, the inevitable fate of the large 
majority. With characteristic self-effaoement he was the first to point out that 
they were not worth going on with. 

He also made frequent visits to Dr E. S. Beaven, whose work on barley 
breeding is well known, and discussed every aspect of yield trials with him. 
These visits were undoubtedly very useful, and although Dr Beaven was never 
tired of protesting that he was no mathematician and did not understand 
“magic squares” or “birds of freedom”, names which he preferred to the more 
orthodox expressions, he had a vast experience of agricultural trials and was 
very quick to see the weak point of any experiment. 

In spite of the quantity of work “Student” did he was never in a hurry or 
fussed; this was largely due to the absence of lag when he turned his mind to a 
new subject; unfortunately others were not always equal to this. He would 
ring one up on the telephone and plunge straight into some subject which might 
have been discussed some days previously. The slower-witted listener would 
probably lose the thread of his discourse before realizing what it was about and 
would ignominiously have to ask him to begin again. I have many times seen 
him hard at it on a Monday morning, but at first meeting it was always “How 
did the sailing go? ” “Well, did you catch any fish? ”, and he would recount any 
notable event of his own week-end before plunging into the very middle of some 
subject. I never heard him say “I’m busy”. 

“Student” had many correspondents, mostly agricultural and other ex¬ 
perimenters, in different parts of the world. He took immense pains with these 
and often explained points to them at great length when he could easily have 
given a reference. His letters contain some of his clearest writing, and the more - 
difficult points are often better elucidated than in his published papers. 

Karl Pearson emphasized the fact that a statistician must advise others on 
their own subject, and so may incur the accusation of butting in without 
adequate knowledge. “Student” was particularly expert at avoiding any such 
disagreement; usually he was such an enthusiastic learner of the other’s subject 
that the fact that he was giving advice escaped notice. 

The reader will by now have realized that “Student ” did a very large quantity 
of ordinary routine as well as his statistical work in the brewery, and all that in 
addition to consultative statistical work and to preparing his various published 
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papers. It might thus be thought that he could have done nothing else but eat 
and sleep when at home; this, however, was far from being the case, and he had 
a great many domestic and sporting interests. He was a keen fruit-grower and 
specialized in pears. He was also a good carpenter, and built a number of boats; 
the last, which was completed in 1932, and on whose maiden voyage I had the 
honour to be nearly frozen to death, was equipped with a rudder at each end 
by means of which the direction and speed of drift could be adjusted—an 
advantage which will be readily appreciated by fly-fishermen. This boat with 
its arrangement of rudders was described in the Field of 28 March 1936. In his 
carpentry he showed preferences analogous to his mathematical ones previously 
mentioned; he disliked complicated or specific tools, and liked to do anything 
possible with a pen-knife. On one occasion, seeing him countersinking screw- 
holes with a pocket-knife, I offered him a proper countersink bit which I had 
with me, but he declined it with some embarrassment, as he would not have 
liked to explain or perhaps could not have explained why he preferred using the 
pen-knife. Out of doors he was an energetic walker and also cycled extensively 
in the pre-war period. He did a lot of sailing and fishing. For his last boat he 
had a most unconventional sail, which cannot be exactly described under any 
of the usual categories; it was illustrated in the Field article referred to above. 

In fishing he was an efficient performer; he used to hold that only the size 
and general lightness or darkness of a fly were important; the blue wings, red 
tails and so on being only to attract the fisherman to the shop. This view was 
more revolutionary when I first heard it than it is now. He was a sound though 
not spectacular shot, and was well above the average on skates. Until the 
accident to his leg in 1934 he was quite a regular golfer, and once went round a 
fairly difficult course in 85 strokes and 1J hours by himself. He used a remarkable 
collection of old clubs dating at least from the beginning of the century. In the 
last few years since his accident he took up bowls with great keenness, and 
induced many other people to play as well. One of his last visits to Ireland was 
with a team which he had organized at the new brewery at Park Royal. 

On top of all this he knew as much as most people of the affairs of the world 
in general and of what was going on about him. It became very difficult to 
imagine how he found 24 hours in any way a sufficient length for the day. His 
wife certainly organized things so that the minimum amount of time was wasted, 
but even so few people could approach such activity in quantity or diversity. 

In personal relationships he was very kindly and tolerant and absolutely 
devoid of malice. He rarely spoke about personal matters but when he did his 
opinion was well worth listening to and not in the least superficial. 

In the summer of 1934 he had a motor accident and broke the neck of his 
femur. He had to lie up for three months, of course working at statistics, and was 
a semi-cripple for a year. This was particularly irksome for such an active man, 
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as was the sheer unnecessariness of the accident, for he ran into a lamp-post on a 
straight road, through looking down to adjust some stuff he was carrying; but 
with great hard work and persistence he eventually reduced the disability to a 
slight limp. 

At the end of 1935 he left Ireland to take charge of the new Guinness brewery 
in London, and I saw comparatively little of him after that. The departure 
from Ireland of “Student” and his family was a great loss to many who had 
experienced their hospitality. His work in London was necessarily very hard 
and accompanied by all the vexations inevitably associated with a big under¬ 
taking in its first stages, before any settled routine has been established; 
nevertheless, he still found time to continue his statistical work and wrote 
several papers. 

His death at the comparatively early age of 61 was not only a heavy blow to 
his family and friends, but a great loss to statistics, as his mind retained its full 
vigour, and he would undoubtedly have continued to work for many more years. 

I am very conscious of the inadequacy of this sketch, which cannot hope 
to convey more than a faint impression of his unique personal quality to those 
who did not know him, but it will have served its purpose if it helps any readers 
to grasp the essential unity and directness of the personality which lay behind 
such widely varied manifestations. 


launce mcmullen 
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ON THE ERROR OF COUNTING WITH 
A HAEMACYTOMETER 

[■Biometrika , V (1907), p. 351] 

When counting yeast cells or blood corpuscles with a haemacytometer there are 
two mam sources of error: (1) the drop taken may not be representative of the 
bulk of the liquid; (2) the distribution of the cells or corpuscles over the area which 
is examined is never absolutely uniform, so that there is an “error of random 
sampling”. 

With the first source of error we are concerned only to this extent; that when 
the probable error of random sampling is known we can tell whether the various 
drops taken show significant differences. What follows is concerned with the 
distribution of particles throughout a liquid, as shown by spreading it in a thin 
layer over a measured surface and counting the particles per unit area. 

Theoretical Consideration 

Suppose the whole liquid to have been well mixed and spread out in a thin layer 
over N units of area (in the haemacytometer the usual thickness is 0-01 mm. and 
the unit area of ^ sq. mm.). 

Let the particles subside and let there be on an average m particles per unit 
area, that is Nm altogether. Then, assuming the liquid has been properly mixed, 
a given particle will have an equal chance of falling on any unit area: 

1 1 l/W 16 ° hanCe ° f itS falUng in a giV6n Unit area is l / N and of its not doin g so 

Consequently, considering all the mN particles, the chances of 0, 1, 2, 3, 
particles falling on a given area are given by the terms of the binomial * 

((BB)" ' 

and if M unit areas be considered the distribution of unit areas containing 
0, 1, 2, 3, ... particles is given by J/j(l — i) -+Aj 

Now in practice N is to be measured in millions and may be taken as infinite. 
Let us find the limit when N is infinite of the general term of this expansion.' 

BPS 
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The (r+l)th term is 

[" i\mN~ r 1 1 \ r mN(mN — 1 )(mN — 2)... (mN — r + 1) 

V~Nj 'W r'- 


“l 1 r\ 

mN-r (mN-r) (mN-r- 1) 
= \ l W~ + NK2 ! 


, ^ o (mN-r)... (mN-r — s + 1) \ 

+ (-l) s N\s~\ 7 

( TO _^) (m-^) 

x m -— -r~- - -• 

r\ 

1 2 r — 1 i r r+ 1 r + s-l 

But when we proceed to the limit and ^ ~]y~ > * * * j\r are 

all negligibly small compared to m, so that the expression reduces to 


m 2 , , 

1 -m+2 1 -- + ( - 1) ^ 


m r m r 

X —r = e _m X —r 

r r ! 


That is to say, the expansion is equal to 


{ m 2 m r ) 

e-»Jl+m + I] + ...+ 7T + ...j. 

Hence it is this distribution with which we are concerned. 

The first moment about the origin, 0, taken at zero number of particles is 

( 2 m 2 3 m z rm T ] 

e- m \m + ^y+-j r + .~+-^ r +-} 


m m 2 


1 + T! + 2 T + - + ( 7 ^I)! + -J 

= m x total frequency. 

Hence the mean is at m. 

The second moment about the point 0 is 

f 2 2 m 2 3 2 m 3 r 2 m r \ 

e_m \ m +-2T + "3T + ‘ ‘ ‘ + TT + ‘ ■ i 

( 2 m 2 3m 3 , rm T \ 

-6-*[«H- Tr +-2 r +...+^r T )- ! + -) 

( m 2 m r t 2m 3 (r-l)m r 

= e-» m +TI +..-+^n)-,+-+ m 2 +Tr + -- + i^rjr + - 


= (m + m 2 ) x total frequency. 
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Hence the second moment coefficient about the mean 

= m + m 2 — m 2 = m. 

By similar* methods the moment coefficients up to were obtained, as follows: 

/4 = m. 
ju>2 = m. 
jti z — m. 

/^ 4 = 3m 2 + m. 

/q = 10 m 2 + m. 

/^ 6 = 15m 3 + 25m 2 + m. 

Hence = 

m 


and 



1 


It will be observed that the limit to which this distribution approaches as m 
becomes infinite is the normal curve with its fi v /? 3 , /? 5 , etc. all equal to 0, and 
/? 2 = 3, /? 4 = 15, etc. 


Further, any binomial (p + q) n can be put into the form (p + q) mlq , and if q be 
small and nq not large it approaches the distribution just given. 

Thus if 1000 (xcR) "f tfo) 500 expanded, the greatest difference between any 


of its terms and the corresponding term of 1000 


1 + 5 + “+ ... + 


5 --) 


* The evaluation of the moments about the point 0 will be found to depend on the 
expansion of r n in the form 

f .-J fr- 1 )* lq fr-m , a ( y - 1 ) ! , (r-l)l ) 

\{r-n~2)\ 1 (r-n-l)\ 2 (r-n)! + "’ n+1 (r~l)!j 

— rl __J!i—--j--}- - - - 4.... -f. —^?± L. I ( r __ i) ? 

\(r~n~ 2)! (r-w-1)! (r~n)l ^( r -l)!j v 

Then if we form the series for n + 1 from this it will be found that the following relations 
hold between cq, eq, eq, etc. and the corresponding coefficients forn+ 1, A x , A 2 , A 3 , etc.: 

A x = eq + n, 

A 2 = a 2 + (n — l)tq, 

A v - + (n—p-j -1 )a P _ 1 . 

From these equations we can write down any number of moments about the point O in 
turn, and from these may be found the moments about the mean by the ordinary formulae. 

The moments may also be deduced from the point binomial (p + q) n ^ when q is small 
and n large and nq = m, i.e. p = 1, q = 0, nq = m. We have 
H^—nq — m t 
H — n PQ - 
th = npq(p-q) = m, 

Pi = npq{l + 3(n — 2)pq} = m( 1 + 3 m) = 3m 2 + m. 


1-2 
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5 5 

is never as much as 1, being about 0-8 for the term 1000 e ~which is 175*5 
against 176*3 from the binomial. 

( 5 2 5 r \ . 

l + 5 + ^-j+...+^|+...) with the binomial 

1000 (M + ^) 100 ’ which of course differ, but not by very much. 


Diagram I. Comparison of the Exponential and Binomial Expansions 


Firm line represents 1000 e~ 5 


* 5 r 
1 +5 + 

r ! 


Broken line represents 1000 


’19 Jj_Y°° 
20 + 20 / 



In applying this to actual cases it must be noted that we have not taken into 
account any “interference” between the particles; there has been supposed the 
same chance of a particle falling on an area which already has several particles 
as on one altogether unoccupied. Clearly if m be large this will not be the case, 
but with the dilutions usually employed this is not of any importance. 

It will be shown that the actual distributions which were tested do not diverge 
widely from this law, so we will consider the probable error of random sampling 
on the supposition that they follow it. 
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We have seen that y 2 — m. 

Hence the standard deviation = *Jm. 

So that if we have counted M unit areas the probable error of our mean (m) is 
0-67449 

v M 

If we are working with a haemacytometer in which the volume over each square 
is 4 oioo mm - there will be 40,000,000 m particles per c.c. and the probable error 

will be 40,000,000 x 0-67449 x 

v M 

Suppose now that we dilute the liquid to q times its bulk, we shall then have 
mjq particles per square, and if we count M squares as before, our probable error 
for the number of particles per c.c. in the original solution will be 40,000,000 

x 0*67449 xq J x . That is 40,000,000 x 0*67449 J ^. 

That is, we shall have to count qM squares in order to be as accurate as before. 
So that the same accuracy is obtained by counting the same number of particles 
whatever the dilution, or, to look at it from a slightly different point of view, 
whatever be the size of the unit of area adopted. 

Hence the most accurate way is to dilute the solution to the point at which the 
particles may be counted most rapidly, and to count as many as time permits: 

then the probable error of the mean is 0-67449 J ~, where m is the mean and M 

is the number of unit areas counted over, squares, columns of squares, microscope 
fields, or whatever unit be selected. 

But owing to the difficulty of obtaining a drop representative of the bulk of 
the liquid the larger errors will probably be due to this cause, and it is usual to 
take several drops: if two of these differ in their means by a significant amount 

compared with the probable error (which is 0-67449 J , where m 1; m 2 are 

the means and M the number of unit areas counted), it is probable that one at 
least of the drops does not represent the bulk of the solution. 


Experimental Work 

This theoretical work was tested on four distributions* which had been counted 
over the whole 400 squares of the haemacytometer. The particles counted were 
yeast cells which were killed by adding a little mercuric chloride to the water in 
which they had been shaken up. A small quantity of this was mixed with a 
10 % solution of gelatine, and after being well stirred up drops were put on the 
haemacytometer. This was then put on a plate of glass kept at a temperature just 
above the setting point of gelatine and allowed to cool slowly till the gelatine had 
set. Four different concentrations were used. 

# One of these is given in Table I. 
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In this way it was possible to count at leisure without fear of the cells straying 
from one square to another owing to accidental vibrations. A few cells stuck here 
and there to the cover glass, but, as they appeared to be fairly uniformly distributed 
and were very few compared with those that sank to the bottom, they were 
neglected: had the object of the experiment been to find the number of cells 
present they would have been counted by microscope fields, and correction made 
for them; but in our case they were considered to belong to a different “popula¬ 
tion” to those which sank. 

Those cells which touched the bottom and right-hand lines of a square were 
considered to belong to the square; a convention of this kind is necessary as the 
cells have a tendency to settle on the lines. 

There was some difficulty owing to the buds of some cells remaining undetached 
in spite of much shaking. In such cases an obvious bud was not counted, but 
sometimes, no doubt, a bud was counted as a separate cell, which slightly increases 
the number of squares with large numbers in them. 

In order to test whether there was any local lack of homogeneity the correlation 
was determined between the number of cells on a square and the number of cells 
on each of the four squares nearest it; if from any cause there had been a tendency 
to lie closer together in some parts than in others this correlation would have been 
significantly positive. 

Distributions 3 and 4 were tested in this way (Table II), with the result that 
the correlation coefficients were 0016 + 0*037 and 0*015 + 0*037. This is satis¬ 
factory as showing that there is no very great difficulty in putting the drop on to 
the slide so as to be able to count at any point and in any order; as good a result 
may be expected from counting a column as from counting the same number of 
squares at random. 

The actual distributions of cells are given below, and compared with those 
calculated on the supposition that they are random samples from a population 
following the law which we have investigated: the probability P of a worse fit 
occurring by chance is then found. 


I. Mean =0-6825: ^ = 0-8117: fi 3 = 1-0876. 


Containing 

0 1 

2 

3 

4 

5 cells 

Actual 

213 128 

37 

18 

3 

1 

Calculated 

202 138 

47 

11 

1-84 0-24 

Y 

Whence x 2 = 9-92 and P = 0-04. 



2 


Best-fitting binomial (1-1898 

-0-1893) -3-6054 

x 400 

for which P = 0-52. 


Mean = 1-3225:/^ = 1-2835:, 

fi 3 = 1-3574. 





Containing 

0 1 

2 

3 

4 5 

6 cells 

Actual 

103 143 

98 

42 

8 4 

2 

Calculated 

106 141 

93 

41 

14 4 

1 


Whence x 2 = 3*98 and P = 0-68. 

Best-fitting binomial (0-97051 + 0-02949) 46 ' 2084 x 400 for which P = 0-72. 
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III. Mean = 1-80: = 1-96: /t 3 = 2-529. 


Containing 

0 1 

2 

3 


4 5 

6 7 

Actual 

75 103 

121 

54 


30 13 

2 1 

Calculated 

66 119 

107 

64 


29 10 

3 

Whence y 2 = 9-03 and P = 0-25. 






Best-fitting binomial (1-0889 -0-0889)- 20 ' 

2473 x 40Q f or -^rhicli P 

= 0*37. 

Mean = 4-68: = 

4-46: /i 3 = 4-98. 






Containing 

0 12 

3 

4 

5 

6 7 

8 

Actual 

0 20 43 

53 

86 

70 

54 37 

18 b 

Calculated 

4 17 41 

63 

74 

70 

54 36 

21 1 


Whence x 2 = 9-72 and P = 0*64. 

Best-fitting binomial (0*9525 +0*0475) 9 8-53 x 40 o f or which P = 0*68. 


9 cells 
1 


11 12 cells 

2 2 

2 1 


These results are given graphically in Diagram II, on the next page. 

It is possible to fit a point binomial from the mean and the second moment 
according to the two equations fi'i = nq, = npq, and these point binomials fit 
the observations better than the exponential series, but the constants have no 
physical meaning except that nq = m. And since the exponential series is a 
particular form of the point binomial and is fitted from one constant, while two 
are used for the ad hoc binomial, this better fit was only to be expected 

It will be noticed that in both I and III the second moment is greater than the 
mean, due to an excess over the calculated among the high numbers in the tail of 
the distribution. As was pointed out before, the budding of the yeast cell increases 
hese high numbers, and there is also probably a tendency to stick together in 
groups which was not altogether abolished even by vigorous shaking. 

In any case, the probabilities 0-04, 0-68, 0-25 and 0-64, though not particularly 
high, are not at all unlikely in four trials, supposing our theoretical law to hold, 
and we are not likely to be very far wrong in assuming it to do so. 

Let us now apply it to a practical problem: for some purposes it is customary 
to estimate the concentration of cells and then dilute so that each two drops of 
liquid contain on an average one cell. Different flasks are then seeded with one 

drop of the liquid in each, and then “ most of those flasks which show growths 
are pure cultures”. 

The exact distribution is given by 


which is 


i+i+w 
2 ^ 2 ! 


+ iil 3 + 

3! ^ 




No. of yeast cells 

0 

1 

2 

3 

- ' 'i 

r 4 

Percentage frequency 

60-65 

30-33 

7-58 

1-26 

1 0*16 


or approximately three-quarters of those which show growth are pure cultures. 



AGB 
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Conclusions 

We have seen that the distribution of small particles in a liquid follows the law 


w ere m is the mean number of particles per unit volume* and the various terms 
m the senes give the chances that a given unit volume contains 0, 1, 2, ... r .. 
particles. We have also seen that this series represents the limit’to which anv 
point bmomial (p + q)« approaches when q is small, insomuch that even 

(it+ 3 V) 100 * 1000 is represented by e ~‘(l + 5 + J+...+ *'+...) x 1000 with 
a maximum error of about 4-5 in 180. 

For the rough calculation of odds with n small compared to 1 jq the exponential 
series may be used instead of the binomial as being less laborious. 

Finally, we have found that the standard deviation of the mean number of 

particles per unit volume is , where m is the mean number and AT the number 

of umt volumes counted, so that the criterion of whether two solutions contain 
different numbers of cells is whether is significant compared with 

0-67449 + 

V W mJ- 

Table I 


Distribution of Yeast Cells over 1 sq. mm. divided into 400 squares 



* The prism standing on unit area. 







jO On the Error of Counting with a Haemacytometer 

It must be noted, however, that the probable error will always be greater 
than that calculated on this formula when for any reason the organisms occur 

as aggregates of varying size. 

In conclusion, I should like to thank Prof. Adrian J. Brown, of Birmingham 
University, for his valuable advice and assistance in carrying out the experimental 
part of the inquiry. 

Table II 


“Centre” squares 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Totals 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

6 

6 

8 

18 

15 

9 

5 

3 

2 

6 

14 

15 
34 
24 

17 

12 

5 

6 

1 

1 

1 

9 

17 

25 

33 

37 

25 

14 

7 

7 

1 

4 

1 

15 

31 

32 

45 

47 

39 

21 

8 

5 

4 

1 

15 

24 

37 

48 

39 

34 

19 

12 

10 

4 

1 

1 

9 

17 

20 

41 

37 

32 

16 

8 

2 

4 

1 

1 

4 

10 

15 

22 

18 

14 

9 

6 

2 

3 

5 

7 

7 

12 

8 

7 

1 

3 

3 

2 

6 

7 

5 

11 

2 

3 

3 

2 

1 

4 

4 

4 

4 

1 

1 

1 

4 

1 

1 

1 

1 

2 

1 

69 

134 

171 

258 

247 

186 

106 

57 

38 

18 

8 

4 

Totals 

72 

136 

180 

248 

244 

188 

100 

56 

40 

20 

8 | 4 

1296 


Mean of “Centre” squares, 4-6821; sjd. 2-139. 

Mean of “Adjacent” squares, 4-7014; s.d. 2-116. 
r = +0-016 ±0-037. 

Correlation table between the number of cells in a square and the numbers of cells in the four adjacent 
squares taken all over Table I. 







THE PROBABLE ERROR OF A MEAN 

[Biometrilca, VI (1908), p. 1] 

Introduction 

Any experiment may be regarded as forming an individual of a ‘‘population” 
of experiments winch might be performed under the same conditions. A series 
of experiments is a sample drawn from this population 
Now any series of experiments is only of value in so far as it enables us to form 
a judgment as to the statistical constants of the population to which the experi- 
nts belong In a greater number of cases the question finally turns on the value 
of a mean, either directly, or as the mean difference between the two quantities. 

as to 6 nU “ b6r ex P erlmente be ver y large, we may have precise information 

, ° f h I™ 1 m° f ^ m6an ’ bUt if ° Ur Sampl6 b6 Sma11 ’ we have two sources of 

uncertainty: (1) owing to the “error of random sampling” the mean of our series 

experiments deviates more or less widely from the mean of the population, and 

2i!Zd mP l 7 SUffi fr % ^ t0 d6termine What is tbe law of distribution 
of individuals. It is usual, however, to assume a normal distribution, because in 

samnle wilT “ ,'T’ ^ giTOS an a PP r()xim ation so close that a smafi 

sample will give no real information as to the manner in which the population 

t7er t T° r r lity: Sin ° e SOme ° f distribution b o assumed it is 
better to work with a curve whose area and ordinates are tabled, and whose 

vZTZZ i own - Thi8 asramption is acoordingly made * P~t 

paper, so that its conclusions are not strictly applicable to populations known not 
to be normally distributed; yet it appears probable that the deviation from 

^ ^ t0 ™ «“*■ We —nedt^ 

solely with the first of these two sources of uncertainty. 

tioutr a ^ etIl0d 0fdetermining tlle P robabi,it y that the mean of the popula- 
f] . , boswitll, n a S 7 n dl3tanC0 of tbe mean of the sample is to assume a normal 
, s/ t r t; n 7 7 m r n ° f the Sample witb a stand ard deviation equal to 

“C < ‘ eTl **" ° f U “ “ d *° ~ * 

devotion 77r e 7° nUmbe 7 f ex P erimen ts, the value of the standard 
ion found from the sample of experiments becomes itself subject to an 

mZS err ° r ’ UntU jUdgmentS reaCh6d “ “ ^ becom e altogether 
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In routine work there are two ways of dealing with this difficulty: (1) an 
experiment may be repeated many times, until such a long senes rs obtained that 
ZTndard deviation is determined once and for all with 

This value can then be used for subsequent shorter series of similar e3 ^ er ™® ' 

(2) Where experiments are done in duplicate in the natural course of the work 
the mean square of the difference between corresponding pairs is equal to t 
standard deviation of the population multiplied by f We can t^ combine 
together several series of experiments for the purpose of determining the standard 
deviation. Owing however to secular change, the value obtained is near y a w y 
too low, successive experiments being positively correlated. 

There are other experiments, however, which cannot easily be repeated very 
often; in such cases it is sometimes necessary to judge of the oertmly ° 
results from a very small sample, which itself affords the on y m ca 10 
variability. Some oh.mie. 1 , many biological, and moot agricoltm.l and largo- 
,0.1. experiment. Mong to thi, ol.», whiob to hltbe.to been .Imoet oufde 

Again, although it is well known that the method of using the normal curv 
is only trustworthy when the sample is “large”, no one has yet told usvery 
clearly where the limit between “large” and “small” samples is to be drawn 
The aim of the present paper is to determine the point at which we may 
the tables of the probability integral in judging of the 

a seriesof experiments, and to furnish alternative tables for use when the num 
of experiments is too few. 

The paper is divided into the following nine sections: 

I The equation is determined of the curve which represents the fr^ency 
distribution of standard deviations of samples drawn from a normal population. 

II. There is shown to be no kind of correlation between the mean and the 
standard deviation of such a sample. 

TTT The conation is determined of the curve representing the frequency dis 
" button of a quantity z! which is obtained by dividing the distance be ween 
the mean of a sample and the mean of the population by the standard deviation 

of the sample. 

IV. The curve found in I is discussed. 

V. The curve found in III is discussed. 

VI. The two curves are compared with some actual distributions. 

VII. Tables of the curves found in III are given for samples of different size. 
VIII and IX. The tables are explained and some instances are given of their use. 
X. Conclusions. 
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Section I 

Samples of n individuals are drawn out of a population distributed normally, 
to find an equation which shall represent the frequency of the standard deviations 
of these samples. 

If 5 be the standard deviation found from a sample x 1 x 2 ...x n (all these being 
measured from the mean of the population), then 

g 2 „ g(gf)./%h)\ 2 = S(xl) 2S(x 1 x 2 ) 

n \ n } n n 2 n 2 

Summing for all samples and dividing by the number of samples we get the 
mean value of s 2 , which we will write s 2 : 


r 2 _ n /^2 nfa _fa(n-l) 

>2 


n 


n* 


n 


where fa is the second moment coefficient in the original normal distribution ofx: 
since x lt x 2 , etc. are not correlated and the distribution is normal, products 

involving odd powers of x 1 vanish on summing, so that 2 e q Ua j q 

If M' r represent the ifth moment coefficient of the distribution of «s 2 about the 
end of the range where s 2 = 0, (n—l) 


M’ 


Again 


'%!) / flfon y 

n \ n ) j 




S(4) 2S(x\x 2 ) 2 S(xj) 4 S(x 2 x 2 ) S(xi) 
n 2 n 2 n 3 n 3 

§S(x\x\) 


n q 


w 


+ other terms involving odd powers of x v etc. which 
will vanish on summation. 

Now S(xf) has n terms, but S(x\x%) has $n(n- 1), hence summing for all 
samples and dividing by the number of samples, we get 

An- 1) 2fi i 




n 


n 


-2fi\ 


, «„.(»-1) 


n* 


+ J+3/*I 


n 


= % i n2 - 2n + 1 ) + % ( n - 1 ) {» 2 - 2w + 3}. 
Now since the distribution of x is normal, fa = 3/4, hence 
M' 


~P- {3w- 3 + n 2 - 2n + 3} = /i 2 ^(w+ l) 

n 2 


w - 
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In a similar tedious way I find 

Mi- + 


and 


m _ A (n-l){n+l)(n + 3){n + 5) 

2 n 4: 


The law of formation of these moment coefficients appears to be a simple one, 
but I have not seen my way to a general proof. 

If now M r be the Bth. moment coefficient of s 2 about its mean, we have 

m 2 = /4 {(» +1) - (» -1)} = 2/4 , 

J(n-\){n+l)(n + Z) 3(w-l) 2(w-l) (n-l) 3 \ 

1' ^3 n ' n 2 n s j 

= «|^^{? i 2 + 4 to + 3 - 6 » + 6-» 2 + 2 to - 1 } = 8 / 4 ^—^, 

r * nr n 

^ ^ {( M _ 1) (m +1) (m + 3) (to + 5) - 32(rc - l) 2 - 12(ra - 1 ) 3 - (» -1) 4 } 

71 / 


= _ U {n* + 9n 2 + 23^+ 15-32^ + 32- 12^ 2 +24^- 12-^ 3 +3^ 2 -3^+ 1} 

n 4 1 


121) (n+ 3) 
n 4 


Hence 






8 



3(» + 3) 


, 2/? 2 — 3/?! — 6 = —{6(» + 3) — 24 — 6(m — 1)} = 0. 

Tb ‘ Jl 

/ 

Consequently a curve of Prof. Pearson’s Type III may be expected to fit the 
distribution of s 2 . 

The equation referred to an origin at the zero end of the curve will be 


y =3 Cx p e~y x , 

n M 2 4/4(^ - 1 )n s _ n 

where r = 2 M's = 8n*/4(n-l) " 2/e a 

4 Ti—1 % —3 

and ^ = ^--1 

Consequently the equation becomes 

n—3 wx 

y = Cx 2 e 2/Za , 

which will give the distribution of s 2 . 

r c° n—3 nx 

x 2 e 2 ^dx — I (say). 
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The first moment coefficient about the end of the range will therefore be 


Jo 1 n J«-o , Jo n ^ X 


n —3 nx 

2 6 2 / i 2 (lx 


1 j . 

The first part vanishes at each limit and the second is equal to 


n— 1 

-/M 

n n-l 


n 




and we see that^the higher moment coefficients will be formed by multiplying 

cessively by ^ ji 2 , ^ /i 2 , etc., just as appeared to be the law of formation 
of Jf', Jf M\, etc. 

tion of?2 it ^rfS! 6 th f th f ° UrVe f ° Und re|U ' esents t,ie theoretical distribu- 
what follows ^ al h0Ugh We haVe n ° actual P roof we ^all assume it to do so in 

tothltS« U a°ul' S r 7 b6 f °T d fr0m thiS ’ Sin06 th6 fre, P Jen< T of '» ^ equal 
to that of s and all that we must do is to compress the base line suitably. 

Now if y 1 — } )e the f re q uenC y ourve 0 f g 2 

and y 2 = f( 8 ) „ 

then Vi d(« 2 ) = y 2 ds, 

or » a ds = 2 yjsda, 

Ha = 2 sy x . 

n- 3 WS 2 




2/ a = 2<7,s(<s 2 ) 2 e 2 ^ 


Hence 

is the distribution of s. 

This reduces to y 2 = 2Cs n ~ 2 e~ 2 ^ 

nx 3 

Hence y = Ax n - s e 2,/J wifi give the frequency distribution of standard devia¬ 
tions of samples of n, taken out of a population distributed normally with standard 

deviation <r. The constant A may be found by equating the area of the curve as 
follows: 

i 00 71X* / 

Area = A x n ~ 2 e ^dx. ^Let I p represent J % p e~*dx^ 

r-2 f*00 


Then 


<r 2 f“ d I _»H S \ 

=-T- 

^ I 


g 2er 3 


-.2 r> 

Jx-0 » J 


00 ntc 2 

Z 2>-2g 2<r 3 ^ 


(T 


“ — (^-1)4-* 


^ 2? 
since the first part vanishes at both limits, 
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By continuing this process we find 

I n _, = ^(n-Z){n-5)...Z.lh 

or =(y 2 (»-3)(»-6)-^-2Ji 

according as n is even or odd. 


But L is 


and I x is 


co nx 2 

e 2cr2 dx 


' 00 nxi 
xe 20-2 dx 
o 


-2 nx 2 ~\x= co 


Hence if n be even, 


and if n be odd, 


(n — 3) (n — 5 )... 3.1 




:=0 n 


(T 


ov n-1 

o- 2 \-r 


Hence the equation may be written 


. , , n-1 n-£ 

_- /( l (“5) 2 a;* 1-2 ® 2<ra (n even) 

(»-3)(w-6)...3.1a/ \7r/ \<r 2 / 


rr.fe)' 


^- 2 e 2fr2 odd), 


or y (w-3)(^-5)...4.2\cr 2 / 

where iV as usual represents the total frequency. 

Section II 

To show that there is no correlation between (a) the distance of the mean of 
a sample from the mean of the population and (6) the standard deviation of a 
sample with normal distribution. 

(1) Clearly positive and negative positions of the mean of the sample are 
equally likely, and hence there cannot be correlation between the absolute value 
of the distance of the mean from the mean of the population and the standard 
deviation, but (2) there might be correlation between the square of the distance 
and the square of the standard deviation. 

„„d 

n \ n ! 


m- 


Then if mj_, M[ be the mean values of u 2 and 5 2 , we have by the preceding part 


Mi = /i 2^” and < = ~ 


Iiiili' fiMirii 1 llF&li m 13*141 fftlffllil 11 d-Ml li I! littitil Uim m ! HIM 111 ill I II 1 Mi f ; * 
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Now pi)) 1 

- ( g(^i)\ a | O S(x lXi ) , >S(4) §S(x\xl) 

\ n j n 3 n l n l 

- other terms of odd order which will vanish on summation. 

Summing for all values and dividing by the number of cases we get 

R u ^<r u ^ + mi M x = g+ /t l^±Lg_ 3/ti (^_i) ; 
where R u z s % is the correlation between u 2 and s 2 . 

s uw<r„*<r 3 »+/t = / «!- - m ~ 3 1 ^ {3 + w-3} = , 


n* 


Hence R, 


'■ u * 8 2 <r U 2 (T 8 a = 0, or there is no correlation between u 2 and s 2 . 


Section III 

To find the equation representing the frequency distribution of the means of 
samples of^ drawn from a normal population, the mean being expressed in terms 
of the standard deviation of the sample. 

C — ns 2 

We have y = — s”-*e ^ as the equation representing the distribution of s, 

the standard deviation of a sample of n, when the samples are drawn from a 
normal population with standard deviation cr. 

Now the means of these samples of n are distributed according to the equation 

*J(n)N 


y = 


V(2^)«r 


g, 2cr 2 * 


and we have shown that there is no correlation between x, the distance of the 
mean of the sample, and s, the standard deviation of the sample. 

Now let us suppose * measured in terms of a, i.e. let us find the distribution 

01 Z — xjs. 

If we have y x = fix) and ?/ 2 = f(z) as the equations representing the frequency 

or x and of 2 respectively, then J 

y x dx = y 2 dz = y % ~, 
s 

«/2 = S«Cl- 


Airy, Theory of Errors of Observations, Part II, § 6. 
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y t) cr 


Hence 

is the equation representing the distribution of a for samples of n with standard 
deviation <5. 

Now the chance that s lies between 5 and s + ds is 




's+ds Q 


a n- 


-s n ~ 2 e ' 2(r *ds 


r°o h jo *- 
- — -s n ~ 2 e 2<T *ds 

J 0 0*- 1 

which represents the N in the above equation. 

Hence the distribution of z due to values of 5 which lie between s and 5 + ds is 


*s+ds Q j/n\ n _ 

^ s ^J\2n) S 
r°° c 

Jo o^ 1 


nsHl +z 2 ) 

1 e 2cr2 ds 


n 

2 7T 


\ C's+ds 

U J* 


ns 2 ( l+ g2 ) 
fi-1 o 2 cr 2 ds 


ns 2 

s n-2 e 2a* fa 


s n-2 e 2a* ds 


J u ~ 

and summing for all values of s we have as an equation giving the distribution o 

If V /*nn nR*(l+Z*) 


Ml 


00 ns 2 (l+z*) 

8 n-l e ' 2a* ds 


i; 


ns* 

s n-2 e 2a* ds 


By what we have already proved this reduces to 
In —2 n — 4 5 3 


y = 


2n — 3 n — 5 4 2 

In —2 n — 4 4 2 


if n be odd. 


. - (1 + z 2 )~ in , if n be even. 


andto y = ^3 -»-6-3‘l 

Since this equation is independent of «r it will give the distribution of the 
distance of the mean of a sample from the mean of the population expressed m 
terms of the standard deviation of the sample for any normal population. 

Section IV. Some Properties oe the Standard 
Deviation Frequency Curve 

By a similar method to that adopted for finding the constant we may find the 
mean and moments: thus the mean is at I n -\jI n - 2 ^ 


which is equal to 


(n-2) (n — 4:) 2_ _ _ 

(»-3 j(»-6)"‘ W 

(n-2)(n-4) 3 /M_<r 

(»-3)(ra-5)"'2V \2)fn’ 


2 \ (x 

-j-, if n be even. 


or 


if n be odd. 
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The second moment about the end of the range is 

4 _ (ft-l)o* 

4—2 ^ 

The third moment about the end of the range is equal to 

^n+l __ 4+1 4—1 

4-2 4-i 4-2 

= cr 2 x the mean. 

The fourth moment about the end of the range is equal to 

4+2 (n— 1)(%+1) 


If we write the distance of the mean from the end of the range Do'/^Jn and the 
moments about the end of the range v v v 2> etc., 


^l U — O' o, Vo — 


n 2 - 1 


2 , ^ __ --~ j „ 


r 4 =-— o'- 

n 


From this we get the moments about the mean: 




^~ 3 < ra ~ 1 > ■ D + W3 } = I? {2-D 2 -2 n + 3}, 


P* = “2 K - 1 - 4 -° 2ra + 6(» -1) Z> 2 - 3Z> 4 } = ^ {» a - 1 - Z) 2 (3Z> 2 - 2 to + 6 )}. 

It is of interest to find out what these become when n is large. 

In order to do this we must find out what is the value of D. 

Now Wallis’s expression for rt derived from the infinite product value of sin * is 

| n 2 2 .4S. 6 2 ...(2 nf 

If we assume a quantity 0 ^ = a 0 +^ + etc. j which we may add to the 2n + 1 
in order to make the expression approximate more rapidly to the truth, it is easy 
to show that 0 = and we get 


l{2n + U^\ = * 

2\ 2 16m./ I 2 .3 2 .5 2 ... (2n— l) 2 ' 


From this we find that whether n be even or odd D 2 approximates to n - -+~ 
when n is large. 


• This expression will be found to give a much closer approximation to ir than Wallis’s. 
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Substituting this value of D we get 

3 /(' 1 _A + J_d 

/f \ 2 n 16n 2 J 


l 1 2 


2 n\ 4:n / 


3cr 4 


/^3 


/^4 


(^"*"2n 16% 2 ) 


4^ 2 5 ™ 4:7l 2 

Consequently the value of the standard deviation of a standard deviation which 
we have found ^ 2n )^{l-(l /to) }) becomes thesameas that found forthe normal 

curve by Prof. Pearson {cr/(2^)} when n is large enough to neglect the 1/4 n in 
comparison with 1. 

Neglecting terms of lower order than 1 jn, we find 

P 1 = n(4n-Z)’ ^ a = 3 ( 1_ 2^)( 1 + 2^:)' 

Consequently, as n increases, /? 2 very soon approaches the value 3 of the normal 
curve, but & vanishes more slowly, so that the curve remains slightly skew. 


Diagikam I. Frequency Curve giving the Distribution of Standard Deviations of samples 
of 10 taken from a Normal Population 



Diagram I shows the theoretical distribution of the standard deviations found 
from samples of 10. 
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Section V. Some Properties oe the Curve 

if n be even 

if n be odd 

n — 2 n — 4: 
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y 


n — 2 n — 4 
n—3'n — 5 



(i+z*y 


A 2 2 

Writing z = tan# the equation becomes y 


n— 3 n — 5 ’* 

affords an easy way of drawing the curve. Also dz = ddj cos 2 0. 

Hence to find the area of the curve between any limits we must find 

n — 2n — 4; C 

--,- - t . e tc. x cos n ~ 2 Odd 

n — 3 n — 6 J 

(n—3 f fcos 71-3 0 sin 

\n-2) w-2 J 

1 n — 4 


etc. x cos^#, which 


n—2 n — 4 
n — S'n — 5 

n— 4 n—6 
n — 5‘n — 1 


* 


..etc. cos^~ 4 #d# + 


n- 

n — Z'n — 5 


... etc. [cos™~ 3 6 sin #]. 


and by continuing the process the integral may be evaluated. 

For example, if we wish to find the area between 0 and 6 for n = 8 we have 


, 6 4 2 1 C 0 

Area = - cos 6 Odd 

5 3 1 77 Jo 


4 2 C d 14 2 

= cos 4 0dO + — . — cos 5 0 sin 0 

3 7T J o 


5 3 77 

+ - cos 0 sin 0 + ~. - cos 3 0 sin 0 + ^^. - cos 5 0 sin #, 
3 77 5 3 77 


77 77 

and it will be noticed that for n 

expression the term \. |. ~. - cos 7 0 sin 0. 

7 5 3 77 


10 we shall merely have to add to this same 


The tables at the end of the paper give the area between — oo and z 


77 

or 0 — — - and 0 = tan -1 z 


\ 


This is the same as 0*5 +the area between 0 = 0, and 0 = tan- 1 z, and as the 
whole area of the curve is equal to 1, the tables give the probability that the 
mean of the sample does not differ by more than z times the standard deviation 
of the sample from the mean of the population. 

The whole area of the curve is equal to 

n — 2 n—4: f +i” 

...etc. x cos n ~ 2 0d0, 


n — Z'n — 5 


■kir 


and since all the parts between the limits vanish at both limits this reduces to 1. 
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Similarly, the second moment coefficient is equal to 

-——-...etc. xf cos w ~ 2 0tan 2 <9ch9 

n — 3 n— 5 J- i7r 

= U -——- ... etc. x f (cos 71-4 6 — cos 71-2 6) dd 
71 —3 Jl—5 J —%n 

_w-2_l_ 1 

3 — 3 ’ 

Hence the standard deviation of the curve is I/^j(n — 3). The fourth moment 
coefficient is equal to 

-—~.... etc. x f cos™ -2 6 tan 4 Odd 
7b 3 7b ~~ 5 J — 

— Vl—- m ——- ... etc. x f (cos™ -6 <9 — 2 cos 11 ' -4 0 4- cos 71-2 6) dd 

7b— 3 7b— 5 J-iTT 

_ 7b — 2 7b — 4 2(^ — 2) ^ _ 3 _ 

“ 7b— 3’%— 5~ w-3 + ~ (w — 3) (Wr — 5)‘ 

The odd moments are of course zero, as the curve is symmetrical, so 


A = 0 , /? 2 = 


3(^ — 3) 
n-& 


6 

^ — 5 ' 


Hence as % increases the curve approaches the normal curve whose standard 
deviation is ll*](n— 3). 

/? 2 , however, is always greater than 3, indicating that large deviations are more 
common than in the normal curve. 


Diagram II. Solid curve y=— cos 10 6 , xfs — tan# 

S 7 5 d 7T 


Broken line curve v — - e - s *> the normal curve with the same standard deviation 

y 4(27r).s 



Distance of mean from mean of population 
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I have tabled the area for the normal curve with standard deviation 1 U7 so as 
to compare with my curve for n = 10.* It will he seen that odds laid according 
to either table would not seriously differ till we reach z = 0-8, where the odds are 
about 50 to 1 that the mean is within that limit: beyond that the normal curve 
gives a false feeling of security, for example, according to the normal curve it is 
,986 to 14 (say 7000 to 1) that the mean of the population lies between -oo 
and + l-3a, whereas the real odds are only 99,819 to 181 (about 550 to 1). 

How 50 to 1 corresponds to three times the probable error in the normal curve 
and for most purposes would be considered significant; for this reason I have only 
tabled my curves for values of n not greater than 10, but have given the n = 9 
and w = 10 tables to one further place of decimals. They can be used as founda- 
tions for finding values for larger samples.f 

The table for n = 2 can be readily constructed by looking out 6 = tan " 1 2 in 
C ambers s tables and then 0*5 + O/n gives the corresponding value. 

Similarly | sin 0 + 0-5 gives the values when n — 3. 

There are two points of interest in the n = 2 curve. Here s is equal to half 

the distance between the two observations, tan-i - = -, s0 that between + « and 

S 4 

7T 1 

- s lies 2 x - x - or half the probability, i.e. if two observations have been made 

and we have no other information, it is an even chance that the mean of the 

(normal) population will lie between them. On the other hand the second moment 
coefficient is 

1 f +1* 1 r 1 + iv 

- tan 2 Odd = - \ tan 6 — d\ = oo, 

KJ-hn 7T L J-i* 

or the standard deviation is infinite while the probable error is finite, 


Section VI. Practical Test or the foregoing Equations 

Before I had succeeded in solving my problem analytically, I had endeavoured 
to do so empirically. The material used was a correlation table containing the 
height and left middle finger measurements of 3000 criminals, from a paper by 
W. R. Macdonell (Biometrika, i, p. 219). The measurements were written out on 
3000 pieces of cardboard, which were then very thoroughly shuffled and drawn 
at random. As each card was drawn its numbers were written down in a book, 
which thus contains the measurements of 3000 criminals in a random order! 
Finally, each consecutive set of 4 was taken as a sample—750 in all—and the 
mean, standard deviation, and correlation:): of each sample determined. The 


* See p. 29. 

t E.g. if » =11, to the corresponding value for n = 9, we add| x f x | x i x leos*d sin0: 
if n = 13 we add as well fg x f x f x f x $ x J cos“ 6 sin 6, and so on. 

f I hope to publish the results of the correlation work shortly. [See 3 below. Ed.] 



24 The Probable Error of a Mean 

difference between the mean of each sample and the mean of the population 
was then divided by the standard deviation of the sample, giving ns the z o 

Section III. , 

This provides us with two sets of 750 standard deviations and two sets of 750 z s 

on which to test the theoretical results arrived at. The height and left middle 
finger correlation table was chosen because the distribution of both was approxi¬ 
mately normal and the correlation was fairly high. Both frequency curves, how¬ 
ever, deviate slightly from normality, the constants being for height = 0-0026, 
y? 2 = 3-175, and for left middle finger lengths /? x = 0-0030, /? 2 = 3-140, and in 
consequence there is a tendency for a certain number of larger standard deviations 
to occur than if the distributions were normal. This, however, appears to make 

very little difference to the distribution of z. 

Another thing which interferes with the comparison is the comparatively large 
groups in which the observations occur. The heights are arranged in 1 inch groups, 
the standard deviation being only 2-54 inches: while the finger lengths were 
originally grouped in millimetres, but unfortunately I did not at the time see the 
importance of having a smaller unit and condensed them into 2 millimetre 
groups, in terms of which the standard deviation is 2*74. 

Several curious results follow from taking samples of 4 from material disposed 
in such wide groups. The following points may be noticed. 

(1) The means only occur as multiples of 0-25. 

(2) The standard deviations occur as the square roots of the following types 
of numbers: n , % + 0-19, ^ + 0-25, ^ + 0-50, %-f 0-69, 2%+ 0-75. 

(3) A standard deviation belonging to one of these groups can only be associated 
with a mean of a particular kind; thus a standard deviation of V 2 can only occur 
if the mean differs by a whole number from the group we take as origin, while 
^1-69 will only occur when the mean is at n + 0-25. 

(4) All the four individuals of the sample will occasionally come from the same 
group, giving a zero value for the standard deviation. Now this leads to an infinite 
value of z and is clearly due to too wide a grouping, for although two men may 
have the same height when measured by inches, yet the finer the measurements 
the more seldom will they be identical, till finally the chance that four men will 
have exactly the same height is infinitely small. If we had smaller grouping the 
zero values of the standard deviation might be expected to increase, and a similar 
consideration will show that the smaller values of the standard deviation would 
also be likely to increase, such as 0-436, when 3 fall in one group and 1 in an 
adjacent group, or 0-50 when 2 fall in two adjacent groups. On the other hand, 
when the individuals of the sample lie far apart, the argument of Sheppard’s 
correction will apply, the real value of the standard deviation being more likely 
to be smaller than that found owing to the frequency in any group being greater 
on the side nearer the mode. 
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These two effects of grouping will tend to neutralize each other in their effect 
on the mean value of the standard deviation, but both will increase the variability. 

Accordingly, we find that the mean value of the standard deviation is quite 
close to that calculated, while in each case the variability is sensibly greater. The 
fit of the curve is not good, both for this reason and because the frequency is not 
evenly distributed owing to effects (2) and (3) of grouping. On the other hand, 
the fit of the curve giving the frequency of z is very good, and as that is the only 
practical point the comparison may be considered satisfactory. 

The following are the figures for height: 

Mean value of standard deviations: Calculated 2-027 ± 0-021 

Observed 2-026 

Difference = —0-001 

Standard deviation of standard deviations: Calculated 0-8556 ± 0-015 

Observed 0-9066 

Difference = +0*0510 


Comparison of Fit . 


Theoretical Equation: y = 


16x 750 0 

V(2 n)& X 6 


top 

' <T* 



Whence x 2 — 48-06, P = 0-00006 (about). 


In tabling the observed frequency, values between 0*0125 and 0*0875 were 
included in one group, while between 0*0875 and 0*0125 they were divided over 
the two groups. As an instance of the irregularity due to grouping I may mention 
that there were 31 cases of standard deviations T30 (in terms of the grouping) 
which is 0 - 5117 in terms of the standard deviation of the population, and they were 
therefore divided over the groups 0*4 to 0-5 and 0*5 to 0*6. Had they all been 
counted in groups 0-5 to 0*6 y 2 would have fallen to 29-85 and P would have risen 
to 0-03. The y 2 test presupposes random sampling from a frequency following the 
given law, but this we have not got owing to the interference of the grouping. 
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When, however, we test the z’s where the grouping has not had so much effect, 
we find a close correspondence between the theory and the actual result. 

There were three cases of infinite values of 2 which, for the reasons given 
above, were given the next largest values which occurred, namely + 6 or — 6. 
The rest were divided into groups of OT; 0*04, 0-05 and 0*06, being divided 
between the two groups on either side. 

The calculated value for the standard deviation of the frequency curve was 
1 (± 0-017), while the observed was 1-039. The value of the standard deviation is 
really infinite, as the fourth moment coefficient is infinite, but as we have arbi¬ 
trarily limited the infinite cases we may take as an approximation 1/^/1500 from 
which the value of the probable error given above is obtained. The fit of the 
curve is as follows: 


Comparison of Fit. Theoretical Equation: y = cos 4 6, z — tan 6 


Scale o: 

£ 

1 

1 

-P 

3 

n> 

fz 

§ 

3 

‘O 

-2-05 to -1-55 

<5> 

+1 

o 

P 

>C> 

+1 

1 

‘O 

2>- 

‘ 

1 

o 

P 

§ 

‘O 

O 

P 

£ 

‘o 

o 

p 

? 

+ 

o 

P 

‘o 

*c> 

' 

+ 

>o 

+ 

8 

+ 

o 

p 

>o 

+ 

§ 

+ 

O 

P 

‘o 

+ 

+ 

o 

P 

»Cl 

+1 

+ 

§ 

<Kt 

+ 

O 

P 

‘Cl 

‘Cl 

+ 

»Ci 

<+> 

+ 

O 

P 

»Ci 

Ci 

+ 

Cr 

4 

p 

a 

J 

Calculated frequency 
5 I 91 | 13* 

1 34* 

1 44* 

! 78* 

1 119 

| 141 

| 119 

1 78* 

1 44* 

1 34* | 

13* 

1 9* 1 

p 

Observed frequency 

9 | 14* | lli 

| 33 

| 43* 

| 70* 

| H9* 

| 151* 

122 

| 67* 

49 

| 26* 

1 16 

L+L_ 

i • 

Difference 
+ 4 | +5 

| “ 2 

i-u 

1 - 1 

- 8 

1 

| +io* 

1 +3 

J_2L 

| +4* 

1 - 8 

| +2* 

1 +* 

i 4 


Whence y 2 = 12-44, P = 0-56. 


This is very satisfactory, especially when we consider that as a rule observations 
are tested against curves fitted from the mean and one or more other moments 
of the observations, so that considerable correspondence is only to be expected; 
while this curve is exposed to the full errors of random sampling, its constants 
having been calculated quite apart from the observations. 

The left middle finger samples show much the same features as those of the 
height, but as the grouping is not so large compared to the variability the curves 
fit the observations more closely. Diagrams III* and IV give the standard devia¬ 
tions of the 2 5 s for this set of samples. The results are as follows: 

Mean value of standard deviations: Calculated 2-186 ± 0-023 

Observed 2-179 

Difference = — 0-007 

* There are three small mistakes in plotting the observed values in Diagram III, which 
make the fit appear worse than it really is. 










Scale of standard deviation of the sample 
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Standard deviation of standard deviations: Calculated 0*9224 + 0*016 

Observed 0*9802 

Difference = +0*0578 
16 x 750 

Comparison of Fit. Theoretical Equation: y — 

Scale in terms of standard deviation of population 


Calculated frequency 

D | 1C+ | 27 | 45£ | 64J | 78£ | 87 | 88 [ 81& | 71 J 58 | 45 [33 | 23 [ 15 | 9^ [ 5+ j 

Observed frequency 

2 | 14 | 27+ | 51 | 64J | 91 | 94£ | 68+ | 65£ | 73 | 48£ | 40| | 42J | 20 | 22+ ] 12 | 5 | 

Difference 

+ 1 1 +H\ +i [+5+1 — |+12+| +71 1-19+|-16 | +2 | -9£| -4|| +9| | -3 | +7j-| + 2j| -£ | 

Whence x 2 — 21*80, P = 0*19. 

Value of standard deviation: Calculated 1 ( ± 0*017) 

Observed 0*982 

Difference = —0*018 

2 N 

Comparison of Fit. Theoretical Equation: y — —cos 4 6, z — tan 0 


Scale of z 



Calculated frequency 

5 j 9J I 13* [ 

34J 

m [ 78i I 119 

141 

119 

m i 

\ 34J I 13* 

Observed frequency 

4 | 15+ | 18 [ 

33^ 

44 75 122 

138 

120| 

71 | 

46£ 36 11 

Difference 






-1 | +6 | +4* | 

- 1 1 

~i | -H | +3 | 

-3 

+ D 

-n | 

+ 2 | +H | -H 


Whence x 2 = 7*39, P = 0*92. 

A very close fit. 

We see then that if the distribution is approximately normal our theory gives 
us a satisfactory measure of the certainty to be derived from a small sample in 
both the cases we have tested; but we have an indication that a fine grouping is 




More than + 3-05 
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of advantage. If the distribution is not normal, the mean and the standard 
deviation of a sample will be positively correlated, so that although both will have 
greater variability, yet they will tend to counteract each other, a mean deviating 
largely from the general mean tending to be divided by a larger standard devia- 
ion. Consequently, I believe that the table given in Section VII below may 
be used m estimating the degree of certainty arrived at by the mean of a few 
experiments, m the case of most laboratory or biological work where the distribu- 
10 ns are as a rule of a cocked hat ” type and so sufficiently nearly normal. 


Section VII. Tables of 


n — 3 n~ 5 



’tan -1 ® 


1 7T 


neve n 



cos n ~ % ddQ 


FOB VALUES OF n FROM 4 TO 10 INCLUSIVE 


jfj {*% 7^2 

Together with e 2 dx for comparison when n = 10 


•(-:) 

n — 4 

n — 5 

n — 6 

n = 7 

n — 8 

n — 9 

n = 10 

For comparison 

Gw L e 2& ) 

0-1 

0-2 

0-3 

0-4 

0-5 

0-6 

0-7 

0-8 

0-9 

10 

0*5633 

0*6241 

0*6804 

0*7309 

0*7749 

0*8125 

0*8440 

0*8701 

0*8915 

0*9092 

0*5745 

0*6458 

0*7096 

0*7657 

0*8131 

0*8518 

0*8830 

0*9076 

0*9269 

0*9419 

0*5841 

0*6634 

0*7340 

0*7939 

0*8428 

0*8813 

0*9109 

0*9332 

0*9498 

0*9622 

0*5928 

0*6798 

0*7549 

0*8175 

0*8667 

0*9040 

0*9314 

0*9512 

0*9652 

0*9751 

0*6006 

0*6936 

0*7733 

0*8376 

0*8863 

0*9218 

0*9468 

0*9640 

0*9756 

0*9834 

0*60787 

0*70705 

0*78961 

0*85465 

0*90251 

0*93600 

0*95851 

0*97328 

0*98279 

0*98890 

0*61462 

0*71846 

0*80423 

0*86970 

0*91609 

0*94732 

0*96747 

0*98007 

0*98780 

0*99252 

0*60411 

0*70159 

0*78641 

0*85520 

0*90691 

0*94375 

0*96799 

0*98253 

0*99137 

0*99820 

M 

1-2 

1-3 

1-4 

1-5 

1-6 

1*7 

1*8 

1- 9 

2- 0 

0*9236 

0*9354 

0*9451 

0*9531 

0*9598 

0*9653 

0*9699 

0*9737 

0*9770 

0*9797 

0*9537 

0*9628 

0*9700 

0*9756 

0*9800 

0*9836 

0*9864 

0*9886 

0*9904 

0*9919 

0*9714 

0*9782 

0*9832 

0*9870 

0*9899 

0*9920 

0*9937 

0*9950 

0*9959 

0*9967 

0*9821 

0*9870 

0*9905 

0*9930 

0*9948 

0*9961 

0*9970 

0*9977 

0*9983 

0*9986 

0*9887 

0*9922 

0*9946 

0*9962 

0*9973 

0*9981 

0*9986 

0*9990 

0*9992 

0*9994 

0*99280 

0*99528 

0*99688 

0*99791 

0*99859 

0*99903 

0*99933 

0*99953 

0*99967 

0*99976 

0*99539 

0*99713 

0*99819 

0*99885 

0*99926 

0*99951 

0*99968 

0*99978 

0*99985 

0*99990 

0*99926 

0*99971 

0*99986 

0*99989 

0*99999 

2-1 

0*9821 

0*9931 

0*9973 

0*9989 

0*9996 

0*99983 

0*99993 


2-2 

0*9841 

0*9941 

0*9978 

0*9992 

0*9997 

0*99987 

0*99995 


2-3 

0*98o8 

0*9950 

0*9982 

0*9993 

0*9998 

0*99991 

0*99996 


2-4 

0*9873 

0*9957 

0*9985 

0*9995 

0*9998 

0*99993 

0*99997 


2-5 

0*9886 

0*9963 

0*9987 

0*9996 

0*9998 

0*99995 

0*99998 


2-6 

0*9898 

0*9967 

0*9989 

0*9996 

0*9999 

0*99996 

0*99999 


2-7 

0*9908 

0*9972 

0*9991 

0*9997 

0*9999 

0*99997 

0*99999 


2*8 

0*9916 

0*9975 

0*9992 

0*9998 

0*9999 

0*99998 

0*99999 


2*9 

0*9924 

0*9978 

0*9993 

0*9998 

0*9999 

0*99998 

0*99999 


30 

0*9931 

0*9981 

0*9994 

0*9998 


0*99999 

.. ... 
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Section VIII. Explanation oe Tables 

The tables give the probability that the value of the mean, measured from the 
mean of the population, in terms of the standard deviation of the sample will lie 
between -oo and s. Thus, to take the table for samples of 6, the probability of 
the mean of the population lying between - oo and once the standard deviation 
of the sample is 0-9622, or the odds are about 24 to 1 that the mean of the popula¬ 
tion lies between these limits. 

The probability is therefore 0-0378 that it is greater than once the standard 
deviation and 0-0756 that it lies outside +1-0 times the standard deviation. 


Section IX. Illustrations oe Method 

Illustration I. As an instance of the kind of use which may be made of the 
tables, I take the following figures from a table by A. R. Cushny and A. R. Peebles 
in the Journal of Physiology for 1904, showing the different effects of the optical 
isomers of hyoscyamine hydrobromide in producing sleep. The sleep of ten 
patients was measured without hypnotic and after treatment (1) with D. hyos¬ 
cyamine hydrobromide, (2) with L. hyoscyamine hydrobromide. The average 
number of hours’ sleep gained by the use of the drug is tabulated below. 

The conclusion arrived at was that in the usual dose 2 was, but 1 was not, ot 
value as a soporific. 

Additional hours ’ sleep gained by the use of hyoscyamine hydrobromide 


Patient 

J- ^ 

1 (Dextro-) 

2 (Laevo-) 

Difference (2-1) 

1 

+ 0-7 

+ 1-9 

+ 1-2 

2 

-1-6 

+ 0-8 

+ 24 

3 

-0-2 

+1*1 

+ 1*3 

4 

-L2 

+ 0*1 

+ 1*3 

5 

-0-1 

-0*1 

0 

6 

+ 34 

+ 44 

+ 1-0 

7 

+ 3-7 

+ 5-5 

+ 1-8 

8 

+ 0-8 

+ 1*6 

+ 0*8 

9 

0 

+ 4*6 

+ 4*6 

10 

+ 2-0 

+ 3*4 

+ 1*4 


Mean +0-75 

Mean +2-33 

Mean +1*58 


s.d. 1*70 

s.d. 1*90 

S.D. 1*17 


First let us see what is the probability that 1 will on the average give increase 
of sleep; i.e. what is the chance that the mean of the population of which these 
experiments are a sample is positive. + 0-75/1-70 = 0-44, and looking out 
2 - o-44 in the table for ten experiments we find by interpolating between 0-8697 
and 0-9161 that 0-44 corresponds to 0-8873, or the odds are 0-887 to 0-113 that 
the mean is positive. 
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That is about 8 to 1, and would correspond in the normal curve to about 1-8 
times the probable error. It is then very likely that 1 gives an increase of sleep 
but would occasion no surprise if the results were reversed by further experiments 

. K n ° W r, Conslder the chance that 2 ^ actually a soporific we have the mean 
increase of sleep _ 2-33/1-90 or 1-23 times the s.d. From the table the probability 
corresponding to this is 0-9974, i.e. the odds are nearly 400 to 1 that such is the 
case. This corresponds to about 4-15 times the probable error in the normal 
curve. But I take it the real point of the authors was that 2 is better than 1. 
lius we must test by making a new series, subtracting 1 from 2. The mean 
values of this series is + 1-58, while the s.d. is 1-17, the mean value being + 1-35 
times the s.d. From the table the probability is 0-9985, or the odds are about 666 
to 1 that 2 is the better soporific. The low value of the s.d. is probably due to the 

ifferent drugs reacting similarly on the same patient, so that there is correlation 
between the results. 

Of course odds of this kind make it almost certain that 2 is the better soporific 

and m practical life such a high probability is in most matters considered as 

a certainty. 

Illustration II. Cases where the tables will be useful are not uncommon in 
agricultural work, and they would be more numerous if the advantages of being 
able to apply statistical reasoning were borne in mind when planning the experi¬ 
ments. I take the following instances from the accounts of the Woburn farming 

lsIcZt^ entS PUblished yearly by Dr Voelcker in the Journal of the Agricultural 

A short series of pot culture experiments were conducted in order to determine 
the causes which lead to the production of Hard (glutinous) wheat or Soft (starchy) 
wheat. In three successive years a bulk of seed corn of one variety was picked 
over by hand and two samples were selected, one consisting of ‘ ‘ hard ’ ’ grains and 
the other of “soft”. Some of each of these were planted in both heavy and fight 
soil and the resulting crops were weighed and examined for hard and soft com 
The conclusion drawn was that the effect of selecting the seed was negligible 

compared with the influence of the soil. 

This conclusion was thoroughly justified, the heavy soil producing in each case 

nearly 100 / 0 of hard corn, but still the effect of selecting the seed could just be 
traced in each year. 

But a curious point, to which Dr Voelcker draws attention in the second year’s 
report, is that the soft seeds produced the higher yield of both corn and straw In 
view of the well-known fact that the varieties which have a high yield tend to 
produce soft corn, it is interesting to see how much evidence the experiments 
afford as to the correlation between softness and fertility in the same variety 
Further, Mr Hooker* has shown that the yield of wheat in one year is largely 
* Journal of the Royal Statistical Society , 1907. 
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determined by tKe weather during the preceding harvest. Dr Voelcker’s results 
may afford a clue as to the way in which the seed is affected, and would almost 
iustify the selection of particular soils for growing seed wheat.* 

The figures are as follows, the yields being expressed in grammes per pot: 

-^ ' | 1899 | 1900 | 1901 | A _ i Standard | , 

deviation 


Yield of corn from soft seed 
„ „ hard „ 

Difference 


Heavy Light 

Heavy 


13-55 7-48 

15-39 

11-328 

13-36 7-97 

13-13 

10-643 

+ 0-19 -0-49 

+ 2-26 

+ 0-685 

20-21 13-97 

22-57 

17-442 

20-26 11-71 

18-96 

15-927 

-0-05 +2-66 

+ 3-61 

+ 1-515 


0-778 0-88 


Difference . +2-10 +0-39 + 0-78 |- 0-05 |+2-66 |+3-61 | +1-515 | 1 -261 |1'20_ 

If we wish to find the odds that soft seed will give a better yield of corn on the 
average, we divide the average difference by the standard deviation, giving us 

z = 0-88. 

Looking this up in the table for » = 6 we find p = 0-9465 or the odds are 
0-9465 to 0-0535 about 18 to 1. 

Similarly for straw z = 1-20, p = 0-9782, and the odds about 45 to 1. 

In order to see whether such odds are sufficient for a practical man to draw a 
definite conclusion, I take another set of experiments in which Dr Voelcker com¬ 
pares the effects of different artificial manures used with potatoes on the large scale^ 
The figures represent the difference between the crops grown with the use oi 
sulphate of potash and kainit respectively in both 1904 and 1905: 
cwt. qr. lb. ton cwt. qr. lb. 

1904 + 10 3 20 : + 1 10 1 261 ( two experiments in each year). 

1905 + 6 0 3 : + 13 2 8) 

The average gain by the use of sulphate of potash was 15-25 cwt. and the 
s d 9 cwt whence, if we want the odds that the conclusion given below is right, 
*= 1 - 7 , corresponding, when» = 4,top = 0-9698 or odds of 32 to 1; this is midway 
between the odds in the former example. Dr Voelcker says: “It may now fan y 
be concluded that for the potato crop on light land 1 cwt. per acre of sulphate oi 

potash is a better dressing than kainit.” . . . ,, 

As an example of how the tables should be used with caution, I take the 
following pot culture experiments to test whether it made any difference whether 
large or small seeds were sown. 

Illustration III. In 1899 and in 1903 “head com” and tail corn weretaken 

. And perhaps a few experiments to see whether there is a correlation between yield and 
‘ ‘ mellowness ’ ’ in barley. 




The Probable Error of a Mean B 8 

from the same bulks of barley and sown in pots. The yields in grammes were 


as follows: 1899 i 90 3 

Large seed ... 13*9 7*3 

Small seed ... 14*4 8*7 

+ 0-5 +1-4 


The average gain is thus 0-95 and the s.d. 0-45, giving z = 2-1. Now the table 
for n = 2 is not given, but if we look up the angle whose tangent is 2-1 in Chambers’s 
tables. 


tan -1 2-1 64 

p = - . ^"" + 0-5 = 


39' 


180 ‘ 


180 ‘ 


+ 0*5 = 0-859, 


so that the odds are about 6 to 1 that small corn gives a better yield than large. 
These odds* are those which would be laid, and laid rightly, by a man whose only 
knowledge of the matter was contained in the two experiments. Anyone con¬ 
versant with pot culture would however know that the difference between the 
two results would generally be greater and would correspondingly moderate the 
certainty of his conclusion. In point of fact a large-scale experiment confirmed 
the result, the small com yielding about 15 % more than the large. 

I will conclude with an example which comes beyond the range of the tables, 
there being eleven experiments. 

To test whether it is of advantage to kiln-dry barley seed before sowing, seven 
varieties of barley were sown (both kiln-dried and not kiln-dried) in 1899 and four 
in 1900; the results are given in the table. 

It will be noticed that the kiln-dried seed gave on an average the larger yield 



Lb. head com per acre 

Price of head corn in 
shillings per quarter 

Cwt. straw per acre 

Value of crop per acre 
in shilhngst 


N. K. D. 

K.D. 

Diff. 

N.K.D. 

K.D. 

Diff. 

N.K.D. 

K.D. 

Diff. 

N.K.D. 

K.D. 

Diff. 



1903 

2009 

+ 106 

261 

m 

0 

191 

25 

+5| 

1401 

152 

+ 111 



1935 

1915 

- 20 

28 

2 61- 

-1-1 

221 

24 

+ 1J 

1521 

145 

—71 



1910 

2011 

+ 101 

291 

28-1 

-1 

23 

24 

+ 1 

1581 

161 

+21 

1899 - 


2496 

2483 

- 33 

30 

29 

-1 

23 

28 

+5 

2041 

199| 

-5 


2108 

2180 

+ 72 

27 1 

27 

~1 

221- 

221 

0 

162 

164 

+2 



1961 

1925 

- 36 

28 

26 

0 

19| 

191 

— 1 

142 

139| 

-21 



2060 

2122 

+ 62 

29 

26 

-3 

241 

22i 

—21 
+ 1 

168 

155 

-13 



1444 

1482 

+ 38 

291 

281 

-1 

151 

16 

118 

1171 

1 

1900 • 


1612 

1542 

- 70 

28| 

28 

~1 

18 

Hi 

—1 

3281 

121 

-71 


1316 

1443 

+ 127 

30 

29 

-1 

14+ 

15| 

+ 11 

1091 

1161 

+7 


l 

151L 

1535 

+ 24 

281 

28 

-i 

17 

m 

+1 

120 

120-1 

+ 1 

Average 

1841-5 

1875-2 

+33-7 

28-45 

27-55 * 

-0-91 

19-95 

21-05 

+ 1-10 

145-82 

144-68 

+1-14 

fcandardl 

mationJ 


— 

__ 

63-1 

— 

— 

0-79 

— 

— 

2-25 

— 

— 

6-67 

iandard i 
iviation f 
rV8 1 


— 

— 

22-3 

— 

— 

0-28 

— 

— 

0-80 

— 

— 

2-40 



t Straw being valued at 15 s. per ton. 

# [Through a numerical slip, now corrected, Student had given the odds as 33 to 1 and 
it is to this figure that the remarks in this paragraph relate. Ed.] 
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of corn and straw, but that the quality was almost always inferior. At first sight 
this might be supposed to be due to superior germinating power in the kiln-dried 
seed, but my farming friends tell me that the effect of this would be that the kiln- 
dried seed would produce the better quality barley. Dr Voelcker draws the 
conclusion: “In such seasons as 1899 and 1900 there is no particular advantage in 
kiln-drying before sowing.” Our examination completely justifies this and adds 
“and the quality of the resulting barley is inferior though the yield may be 
greater”. 

In this case I propose to use the approximation given by the normal curve 
with standard deviation s/*J(n — 3) and therefore use Sheppard’s tables, looking up 
the difference divided by s/aJS. The probability in the case of yield of corn per 
acre is given by looking up 33-7/22-3 = 1-51 in Sheppard’s tables. This gives 
p = 0*934, or the odds are about 14 to 1 that kiln-dried corn gives the higher 
yield. 

Similarly 0-91/0-28 = 3-25, corresponding top = 0-9994,* so that the odds are 
very great that kiln-dried seed gives barley of a worse quality than seed which 
has not been kiln-dried. 

Similarly, it is about 11 to 1 that kiln-dried seed gives more straw and about 
2 to 1 that the total value of the crop is less with kiln-dried seed. 

Section X. Conclusions 

1 . A curve has been found representing the frequency distribution of standard 
deviations of samples drawn from a normal population. 

2 . A curve has been found representing the frequency distribution of values 
of the means of such samples, when these values are measured from the mean of 
the population in terms of the standard deviation of the sample. 

3. It has been shown that this curve represents the facts fairly well even when 
the distribution of the population is not strictly normal. 

4. Tables are given by which it can be judged whether a series of experiments, 
however short, have given a result which conforms to any required standard of 
accuracy or whether it is necessary to continue the investigation. 

Finally I should like to express my thanks to Prof. Karl Pearson, without 
whose constant advice and criticism this paper could not have been written. 

# As pointed out in Section V, the normal curve gives too large a value for p when the 
probability is large. I find the true value in this case to be p = 0-9976. It matters little, 
however, to a conclusion of this kind whether the odds in its favour are 1660 to 1 or merely 
416 to 1. 
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PROBABLE ERROR OF A CORRELATION 
COEFFICIENT 

[BiometHka, VI (1908), p. 302] 

At the discussion of Mr R. H. Hooker’s recent paper “The correlation of 
weather and crops” (Journal of the Royal Statistical Society, 1907) Dr ShaVmade 

nl] ®J q , TCSti0n W8 f an f Vered V, >’ Messrs Yule and Hooker and Prof. Edgeworth 

hit oT^r T ^ ^ H ° 0ker ^ PIObabl y Safe “ taking 0 50 asto 
limit of significance for a sample of 21. They did not, however, answer Dr Shaw’s 

question m any more general way. Now Mr Hooker is not the only statistician 

who is forced to work with very small samples, and until Dr Shaw’s question has 

wouT 0 ^ 7 6 r6SUltS ° f SUCb inv ^ ti ^ns lack the criterion which 

would enable us to make full use of them. The present paper, which is an account 

of some sampling experiments, has two objects: (1) to throw some light by em 

Xo^ 

A random sample has been obtained from an indefinitely large* population 
ind rf calculated between two variable characters of theindividuafcomposing the 
lample. We require the probabihty that B for the population from Sch the 
>ample is drawn shall lie between any given limits 

It is clear that in order to solve this problem we must know two things • 0 ) the 

“d mh° r “rS fr “ ”“«*• ° f * »»!»“«kS . ( 

i, and ( 2 ) the a prion probability that R for the population u 4 . g 

ivenltaits. Now( 2 )o^h^y everb6klmw sofctameirb . “7 

ra.i m gen.ral 1* rn.de; »*,„ „ ln „ (1| „ ^ b< _ mon J t ,,1^ 

editions not modern; but one can imagine thJnnn l y * Wlth farming under 

years to be a sample from this ^ Population indefinitely increased and the 


3*2 
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what will he the best assumption to make, but meanwhile I may suggest two 
more or less obvious distributions. The first is that any value is equally hkely 
between +1 and -1, and the second that the probability that x is the value is 

proportional to l-* 2 : this I think is more in accordance with ordinary experience: 

the distribution of a priori probability would then be expressed by the equation 

y But whatever assumption be made, it will be necessary to know (1)> 80 that 
the solution really turns on the distribution of r for samples drawn from the same 
population. Now this has been determined for Urge samples with as much accuracy 
as is required, for Pearson and Pilon (Phil. Trans. A, cxci, p. 229 et seq.) showed 
that the standard deviation is (1 -r»)/> and of course for large samples the dis¬ 
tribution is sure to be practically normal unless r is very close to unity. But their 
method involves approximations which are not legitimate when the samp ® 18 
small. Besides this the distribution is not then normal, so that even if we had the 

standard deviation a great deal would still remain unknown. . 

In order to throw some light on this question I took a correlation tab 
containing 3000 cases of stature and length of left middle finger of criminals, and 
proceeded to draw samples of four from this population.! This gave me 750 
values of r for a population whose real correlation was 0-66. By taking the Matures 
of one sample with the middle finger lengths of the next sample I was enabled to 
get 750 values of r for a population whose real correlation was zero. Next I com¬ 
bined each of the samples of four with the tenth sample before it and with the 
tenth sample after it, thus obtaining two sets of 750J values from samples of 8, 

with real correlation 0*66 and zero. 

Besides this empirical work it is possible to calculate a priori the distribution 

for samples of two as follows. . . , 

For clearly the only values possible are +1 and -1, since two points must 

always he on the regression line which joins them.§ 

Next consider the correlation between the difference between the values of one 
character in two successive individuals, and the difference between the values oi 
the other character in the same individuals. It is well known to be the same as 
that between the values themselves, if the individuals be m random order. 

Also if an indefinitely large number of such differences be taken, it is clear tha 
the means of the distributions will have the value zero. Hence lf * he ° orre ^ 1 , 01 
be determined from a fourfold division through zero we can apply Mr Sheppard s 

» Rinmetrika I p. 219: W. R. MacdonneU. t Biometrika v I, p. 13. [2, p. 23. 

J Kotstrictly independent, but practicaily sufficiently nearly so. This method was adopts 

m ^There Xof^ouree^determinate cases when the values are the same for one character 
but they become rarer as we decrease the unit of grouping until, with an infinitesimal un 
grouping, the statement in the text is true. 

|| Phil. Trans. A, oxen, p. 141. 



Probable Error of a Correlation Coefficient 37 

result that if A and B be the numbers in the large and the small divisions of the 

jtB 

table respectively cos - g ■ —-g = B, where R is the correlation of the original 
system. 

But if a pair of individuals whose difference falls in either of the small divisions 
be considered to be a random sample of 2, their r will be found to be — 1, while 
that of a pair whose difference falls in one of the large divisions is +1. Hence 
the distribution of r for samples of 2 is AN at + 1, and BN at - 1, where A + B = 1 } 

, „ cos- 1 .# 

and B — -. 

7T 

When R — 0 , there is of course even division, half the values being +1 , and 

cos— 1 0*66 

half — 1; when B = 0-66, B =-= 0*271, therefore A — 0*729, and the 

77 

mean is at 0*729-0*271 = 0*458. The s.d. = ^{1 - (0-458) 2 } = 0*889. It is note¬ 
worthy that the mean value is considerably less than R. 

I have dealt with the cases of samples of 2 at some length, because it is possible 
that this limiting value of the distribution with its mean of (2 /tt) sin -1 B and its 
second moment coefficient of 1 — {(2 /n) sin -1 B} 2 may furnish a clue to the distribu¬ 
tion when n is greater than 2. 

Besides these series, I have another shorter one of 100 values of r from samples 
of 30, when the real value is 0*66. The distributions of the various trials are given 
in the table below. 

Several peculiarities will be noticed which are due to the effects of grouping, 
particularly in the samples of 4. Firstly, there is a lump at zero; with such small 
numbers zero is not an uncommon value of the product moment and then, 
whatever the values of the standard deviations, r — 0. 

Next there are five indeterminate cases in each of the distributions for samples 
of 4. These are due to the whole sample falling in the same group for one variable. 
In such a case, both the standard deviation and the product moment vanish and 
r is indeterminate. 

Lastly, with such small samples one cannot use Sheppard’s corrections for the 
standard deviations, as r often becomes greater than unity. So I did not use 
the corrections except in the case of the samples of 30, yet on the whole the values 
of the standard deviations are no doubt too large. This does not much affect; the 
values of r in the neighbourhood of zero, but there is a tendency for larger values 
to come too low, so that there is a deficiency of cases towards 1 and — 1. This 
introduces an error into the standard deviation of all the series to some extent, 
but of course the mean is unaltered when there is no correlation. The series for 
samples of 4 are affected more than those from samples of 8, as the mean standard 
deviation of samples of 4 is the smaller, so that the unit of grouping is com¬ 
paratively larger. 
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The moment coefficients of the five distributions were determined, and the 
following values found:* 



Mean 

S.D. 

i“a 


^4 

h 

A 

Samples of 4 (r = 0) 
Samples of 8 (r — 0) 
Samples of 4 (r — 0-66) 
Samples of 8 (r = 0-66) 
Samples of 30 (r = 0-66) 

0-5609 

0-6139 

0-661 

0-5512 

0-3731 

0-4680 

0-2684 

0-1001 

0-3038 

0-1392 

0-2190 

0-07202 

0-01003 

-0-1570 

-0-02634 

-0-000882 

0-1768 

0-0454 

0-2152 

0*02714 

0-000461 

■ 

2-245 

1-857 

0-7713 

1- 918 

2- 336 

4- 489 

5- 232 
4-580 


Considering first the “no correlation ” distributions, I attempted to fit a Pearson 
curve to the first of them. As might be expected, the range proved limited and 
as symmetry had been assumed in calculating the moments, a Type II curve 

( x 2 \ 0-272 

1 — - - - ---1 , the range of which is 2-074. 

Now the real range is clearly 2, and only a very small alteration in /? 2 is required 
to make the value of the index zero. Consequently the equation y = y 0 (l — as 2 ) 0 
was suggested. This means an even distribution of r between 1 and —1, with 
s.d. = 0-5774 ±0-010 vice 0-5512 actual, = 0-3333 ± 0*0116 vice 0-3038, 
//, 4 = 0-2000 ± 0-016 vice 0-1768 and fi 2 = 1-800 + 0-12 vice 1-918, all values as 
close as could perhaps be expected considering that the grouping must make both 
and too low. 

Working from 2 /= 2 / 0 (l — £ 2 ) 0 for samples of 4, I guessed the formula 

n —4 

y — y 0 (l — x 2 ) 2 and proceeded to calculate the moments. 

By using the transformation x — sin 6, we get y = y 0 cos™ -4 6 , 

dx — cos Odd, 

2 f ydx — 2y 0 f cos n ~ z 0d6, 

J o J o 

r i nn nn 

2 j x 2 ydx = 2y 0 I cos n ~ 3 6d6 — 2y 0 I cos n ~ 1 6d6, 

Jo Jo Jo 

and so on. 

Whence 

1 3 „ _ S(n— 1) _ Q 6 

n—V (n— 1)(^+1)’ n +1 n + 1* 

Putting n = 8 we get the equation y = y 0 (l — x 2 ) 2 and 

ju 2 = + = 0-1429 ± 0-0050 instead of actual 0-1392, 
fi A = ^ = 0-0476 ± 0-0038 „ 0-0454, 

cr = 0*3780 + 0*0066 „ 0-3731, 

^2 = 3-1 = 2-333 + 0-012 „ 2-336. 

* In the cases of no correlation the moments were taken about zero, the known centroid 
of the distribution. 
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( x 2 \ 2-021 
1 — ' 6 ~ 98 02/ 5 

whence the calculated range is 1-98, whereas it is known to be 2. 

The following tables compare the actual distributions with those calculated 
from the equations. 


Distribution of r from samples of 4 compared with the equation 

y = 1 — ^ 2 )° 



From this we get y 2 — 13-30, P = 0-34. It will however be noticed that the 
grouping has caused all the middle compartments to contain more than the 
calculated, as pointed out above. 


Distribution of r from samples of 8 compared with the equation 



Whence x 2 = 13 94, P = 0-30. 


In this case the grouping has had less influence and the largest contributions 
to x 2 (i n the second, sixth, eighth and twelfth compartments) are due to differences 
of opposite sign on opposite sides, and may therefore be supposed to be entirely 
due to random sampling. 

My equation then fits the two series of empirical results about as well as could 
be expected. I will now show that it is in accordance with the two theoretical 
cases n “large” and n — 2, for cr = l/f(n — 1), which approximates sufficiently 
closely to Pearson and Filon’s (1 — r 2 )/^n when r = 0 and n is large. Also when 
n is large /? a becomes 3 and the distribution is normal. 
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And ifn = 2 > the equation becomes y = y 0 (l — where 

N 
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Vo 


Put X = sin 6. Then dx = cos Odd. 


C 1 

5 (1 

Jo 


■x 2 )^dx 


Vo 


N I fl” „ N I 

= 2/J 0 sec Gdd = — I co = 0 . 


i.e. there is no frequency except where (1 -*■)-* is infinite, all the frequency is 
equally divided between x = land* = -1, which we know to be actually the case. 

Consequently I believe that the equation y = y 0 (i probably represents 

the theoretical distribution of r when samples of n are drawn from a normally 
distributed population with no correlation. Even if it does not do so, I am sure 
tiiat it will give a close approximation to it. 

Let us consider Mr Hooker’s limit of 0-50 in the light of this equation. For 

21 cases the equation becomes * = and the proportion of the area lying 

beyond x = + 0*50 will be 

rector 

cos 18 Odd 


y = ?/ o cos 17 0 
J &~ain~ 1 0-50 


Pin 

Jo 


cos 18 6d6 


. . “ d * hl . S *° be °' 02099 ’ or WQ m ay expect to find one case in 50 occurring 
outside the limits ± 0-50 when there is no correlation and the sample numbers 21 

W f n h ™ e J e l th6re ^ correIation > 1 cannot suggest an equation which will 
accord with the facts; but as I have spent a good deal of time over the problem 
I will point out some of the necessities of the case. 

(i) With small samples the value certainly lies nearer to zero than the real 
value of B, e.g. 

Samples of 2: Mean at 2 sin -i B 

Samples of 4 (real value 0*66): 0*561f + 0*011, 

Samples of 8 (real value 0*66): 0-614J + 0*065. 

and a - P o a w^K CUrV ^ befl , tt !f ^*° thedistri fi«t io nwhosemcrnentccefflcientsare= 1 = «. 
rgiv A en~by ^ A = A = °> ^ the * of Type IX and thc’equatio^ 

V = yf ~) m . where a* = = 1 and « = Mzl or „ = ^ 

agreeing with the general formula. 


larger than thiS <P erha P s ^ 0 - 08 ) as Sheppard’s 
$ Again higher, but not by more than 0*02. 
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But with samples of 30 (real value 0-66), mean at 0-6609 ± 0-0067 shows that the 

mean value approaches the real value comparatively rapidly. 

(2) The standard deviation is larger than accords with the formula 


(l-r 2 )/V(»-l) 


even if we give 
of 2, 


the mean value of r for samples of the size taken, e.g. 


for samples 


Bor samples of 4: calculated* 0-3967 ± 0-0069; actual 0-4680, 

For samples of 8: calculated 0-2355 ±0-0041; actual 0-2684. 

But samples of 30, calculated 0-1046 + 0-0018, actual 0-1001, again show that 
with samples as large as 30 the ordinary formula is justified. 

(3) When there was no correlation the range found by fitting a earson curve 
to the distribution was accurately 2 in the theoretical case of samples of 2, an 
well within the probable error for empirical distributions of samples o an . • 
But when we have correlation this process does not give the range closely for the 
empirical distribution (samples of 4 give 2-137, samples of 8, 2-699, samples o 
30, infinity) and the range calculated from samples of 2, which is 

2 *J( 4 + 3/4 2 + 18/dj — 9/4) 

3 +/^2 


(where // 2 = 1 -{(2 jn) sin" 1 Kf), is always less than 2 except in the case where 

/to is 1, i.e. when there is no correlation. , 

Hence the distribution probably cannot be represented by any of Prof. Pearson s 

types of frequency curve unless R = 0. 

(4) The distribution is skew with a tail towards zero. 

(5) To sum up: If y = R,n) be the equation, it must satisfy the following 

requirements. If M = 1, 1 is the only value of x which gives the value of y other 
than zero. If n = 2, + 1 are the only values of x to do so. If R = 0, the equation 


probably reduces to y = y 0 (l — % 2 ) 2 • 


Conclusions 

It has been shown that when there is no correlation between two normally 

distributed variables y = y 0 (l-x*f^ gives fairly closely the distribution of r 
found from samples of n. 

Next, the general problem has been stated and three distributions of r have 
been given which show the sort of variation which occurs. I hope they may serve 
as illustrations for the successful solver of the problem. 

. (1 1), where r is taken as the mean value for the size of the sample. If we took 

the real value B, the difference would he even greater. 
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THE DISTRIBUTION OF THE MEANS OF SAMPLES 
WHICH ARE NOT DRAWN AT RANDOM 

[Biometrika, VII (1909), p. 210] 

J. T J n ° ne f ° f the advant,ages of the normal curve that if samples are drawn at 

thfsZ- f° m , any P° PU 0n ’ n ° matter how distributed, the distributions of 

samnles ^ ° f ^ SampleS rapidly a PP roach the Gaussian as the 

samples grow large. 

woodlnd Wh't’ > th6 reS "'V jfgr0Uping 2000insam P les of 25 g^n inDrs Green- 
is sur risin * ° 8 mteres ^ n § P a P er in BiometriJca (vi (1909), pp. 376-401) 

JZV 3 6aS V° S ' 10W 1 thatlf ^- the constants of the distribution of the 

"‘" dom ’ ”™ ,pond1 ” 8 *» a . a »*»• 

and B 2 - 3 = ^~ 3 . 

But in this case n 

/?! = 1*7977 and B x = 0*4756, while ™ = 0*0719 

n 9 

/? 2 -3 = 2-5790 and B 2 -3 = 0-3185, while ^^ = 0-1032 

n 

buf at uf h6r 0f , these ° an be cons idered significant with a sample of 80 means, 
but at the same time they are both sufficiently different to suggest that the 
condemns which led to the theoretical result have not been fulfilled. 

he first thing which occurred to me was that as Sheppard’s corrections had 

^ f ° r ^ ° riginal ^ be 
This however makes but little difference, for we get 

A 

25 

A-3 


n 


A = 1*9898 so that £§ = 0*0796, 


A-3 = 2*7725 so that 


25 


0*1109. 


thatTlf C ° nSldered tlle P ossi1 >ihty that the samples were not strictly random but 
that there was some shght correlation between successive observations. 

* Henderson, R., J. Inst. Actu. xli, pp. 429 - 42 . 
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I therefore assumed that the individuals composing the sample were more like 
each other than to the rest of the population, that in fact there was homotyposis, 
and working from this hypothesis I found that the slightest correlation produces 
a very marked retardation in the approach to normality with increase m the 
size of the sample. 

It will be observed that this is essentially a “small sample” problem, for with 
increase in the size of the sample the correlation due to likeness between successive 
individuals diminishes except in exceptional cases, when it becomes mamles 
as a well-marked heterogeneity. 

My results emphasize the necessity of avoiding anything which tends to produce 
secular variation and as far as possible to neutralize it by repeating observations 
only after some time has elapsed. 

Thus repetitions of analyses in a technical laboratory should never follow one 
another but an interval of at least a day should occur between them. Otherwise 
a spurious accuracy will be obtained which greatly reduces the value oi the 

Inthe present case there is not sufficient evidence to show whether correlation 
was really present, but as in the course of a fairly extended practice I have not 
yet met with observations in which this tendency was altogether absent, I incline 

to the belief that it was. . 

In any case, being ignorant of the technique, I can only suggest as possibilities 
slight variations from point to point on the slide, differences in light or m the 
observer as the day went on. 

The general problem is as follows: 

Let samples of n be drawn from a population with constants /i 2 , fi s , /i 4 , Pi, P-i, 
and let the samples be drawn in such a manner that the individuals composing 
each sample are correlated with correlation coefficient r, then, assuming linear 
regression and homoscedastic arrays, the constants of the. distribution of their 
means (M 2 , M s , M,, B lt B 2 ) are as follows: 

M 2 = — {1 + (» — l)r}, 
n 

M 2 = ^\{\ + (n-l)r}{l + (2n-l)r), 

_ {l + (»- jMr {1 + ( zn-\)r + 3»(»-1) i-} + 3(»- 1) (1 - r) (1 + 

4 ~ « 3 (l + 2r) L ™- 

R {i + (2»-iW 2 

~ n (l + r) 2 {l + (»-lM’ 

/?_ /1 4 - iSm— \\r + 3n(n— l)r 2 | . 3(w- 1) (1 -r) (1 +nr) 

-® 2 -~ (1 + 2r){l + (»— l)r} m(I + 2r){l+ (»-!)»■} 
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which are not drawn at Random 

As the method of determining the three moment coefficients is the same in each 
case and it is merely a question of reduction to obtain B 1 and B 2 , it will be suffi¬ 
cient for me to give the proof for Jf 4 . 

Let x 1 x 2 .^x n be the values, measured from the mean of the population, of 
the individuals composing the typical sample, and let there be N such samples. 

Then 




X 1 + %2 + ... -f X 7 

n 


l y£(*!) + 4 S(a»x a ) + 6 S(x$xl) + 12 S(x^x 2 x x ) + 24 Sfax^x.) 


N 


w 


.(i) 


Taking each of these six terms in turn we have 


= nS(n Xl 3*) _ ^ 

Nn* N.n 4 ~ n 3 ' .( u ) 

For 8(xf) has n terms, and when they are taken over all the N samples which 
compose the population there will be n.n Xl of xf, n Xy being the number ofaq’s in 
the population and n XiXi the number of aq’s associated with * 8 ’s, and so on. 
Again, there are n(n— 1) terms in 8(x fm 2 ), 

. 4(n-l)2(n„^3*x.) 

N.n 4 N~n» 

_ 4 (w — 1) 2(n Xy , xf. mean value of x 2 ) 

N. yfi ‘ 

But the mean value of x 2 associated in the sample with x x will be x lf or 
since <r Xi = a Xi it is rx lt ° Xl 

. xj} _i{n-l)Z(n x ,.x{.r) 

N.n 4 N.«' 3 " 

__ 4:(n— l)r 

~ ^ A 4- .(iii) 

Next, j ffifffofa?!)} 3(w-l) S(n x , x ,.x$xl) 

N.n 4 n z ' N 

= *) N(n Xi .x \.mean value of x%) 

n 3 N ~ 

[Now the mean value of x\ is equal to the square of the s.d. of the x 1 array of 
^a’ 8 - {Aa(! added to the square of the mean value of x 2 , (r 2 x f)] 

= 3 ("- *) ^ n xi{ rix i + \ -r 2 )} 

3 (n— 1) 

= -^T-{»- 2 A4 + (l-r 2 )Al}. 


(iv) 
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Again, 


Z{12S(x\x 2 x 3 )} _ 6 (%-l) (n- 2 ) I(n x ^ x ^x\x 2 x 3 ) 
Nn 4 


n a N 


_ 6(n-l)(n- 2 ) E{n x ^. x 2 . mean value of x 3 ) 


w 


N 


The mean value of x 3 for values x 1 and x 2 of the other two variables is given by 
the equation 


m — _ l^31 X l + ^32 #2 
A 33 l °"iCi ^2 


v x z 


where the R ’s are the minors of the determinant 


(f — f2\ f 

or m = (x 1 + x 2 ). 3 - = (x 1 + x 2 ). 


l—r 


l, r, r 
r, 1 , r 
r , r, 1 

Substituting, we get 

i7{12/S^(a?|a? 2 £P 3 )} 6 (?i- 1 ) (w- 2 ) r ^(^ 2 (^^_+^|)) 

N.n* ” * ‘‘ 

By (iii) and (iv), 


1 +r 


N.n 4 » 3 'l+r' IV 


6 (m— 1 ) (» — 2 ) r 


m. 3 * 1 + T 


{r/t 4 + >-V 4 +(l-»- 2 )/tl} 




•(V) 


Lastly, 


il{24 ) S(* 1 * 2 * 3 * 4 )} _ ( w — 1) (w — 2 ) (»-3)i7(w *^1 ^2 *^3 *^ 4 ) 

" N 


Nn 4 


(n-\)(n-2) (n - 3) r(% ra . 2a . 3 . x 1 x 2 x 3 .mean value of x 4 ) 

A" 


As before the mean value of x 4 comes from the multiple regression equation 


m 


Q'xi L ^41 , _ ^42 , ^43 


'z& 


R 


'44 


R „ 


■■ + x< 




^2 


^3 


where the R’s are minors of 


1 , r, r, r 

r } 1 , r, r 

r, r, 1 , r 

r, r, r, 1 

( f (l _ r )2) f 

m ic 4 = (^i + + ^3) j _ 3^2 _{_ 2r 3 = (^1 + ^2 + ^3) • ^ _j_ gr 
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Substituting, we get 

2’{24 8(x 1 x i x 3 x t )} _ (n-l)(w-2)(»-3) r Z{n XlX ^ X3 .x 1 x 2 ggfa-f x 2 + a? 3 )} 

■V« 4 n 3 ‘1 + 2 r JV “ 

_ (»-!)(»-2)(»-3) r ZZ(n nx%X3 .xfx 2 x 3 ) 


Applying (v), 


n 3 1 + 2r 

__ 3(w-l)(w-2)(w-3) r 2 


71" 


1 + 2r 


N 

{^ 4 + (l-~r)^i}. 


,(Yi) 


Substituting (ii)... (vi) in (i), we get 

^4 == ^3 j /^4 + 4(^1 — 1) r H + 3 (fl — 1) {V 2 /^ 4 + (1 — r 2 ) /^|} 

+ 6(n - 1) (n - 2). r. {r^ 4 + (1 - r) /£§} 


+ 3(»-l)(n-2)(n-3). T - j: ^ i {^ 4 +(l-f)^S}}, 


which reduces to the result given above, viz. 

M * = K 1 + ( 3m - 1 ) r + 3 »(» - i;)■+ 3 (»- i) (1 - r) (i + nr) /t 2 ], 

Using these equations it is possible to find values of r which would satisfy the 
conditions for the various constants. 

Thus (using Sheppard’s corrections for both sets of constants) I find! that with 
the given values of ^ and M 2! r = 0-003, 

of fi x and B v r = 0-063, 

of and B 2 , r = 0-033. 

Now clearly if r were fitted by least squares or in any other way from these 
three values it must clearly come closest to the y 2 value owing to the lower 
probable error of jlc 2 . As a proper fitting would clearly be very complicated owing 
to the intercorrelations of the constants, I have assumed a value r = 0-01 as a nice 
round number; this gives a value of M 2 higher than that found in the sample 
before us, but not at all impossibly so. 

This gives M 2 = 0 - 1101 , actual 0-1074, 

B x = 0-1397, „ 0-4756, 

B 2 = 3-2012, „ 3-3185. 

These constants give a Type I curve 

(l- 

\ 47-82/ 


( r \ 24*64 

1+ F65/ 


1266*8 


If we assume no correlation I get a curve 

y= 109-0 (l+~V 17 (l- —) 

* \ 1*92/ \ 58-31/ 

whence I get the following "fits”.* 

* The figures given are really mid-ordinates, but for such small numbers the difference 
between the mid-ordinate and the area on the base unit is negligible. 
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Below 1-10 

1-10 to 1-22 

M 

M 

O 

-P 

04 

04 

M 

so 

m 

M 

'5' 

Ml 

: M 

1-46 to 1-58 

1-58 to blO 

04 

op 

M 

-8 

m 

1-82 to 1-94 

1-94 to 2-06 

2-06 to 2-18 

2-18 to 2-30 

2-30 to 2-42 

Ml 

04 

o 

-p 

04 

M 

04 

so 

so 

04 

O 

-P 

M 

04 

2-66 to 2-78 

Above 2-78 

Actual 
















■ — 

4 

8 

7 

14 

12 

12 

5 

7 

5 

2 

1 

2J 

0 1 

1 

Y ■ — 



Calculated: No correlation 

1-01 I 2-42 | 5-28 I 8-86 I 11-69 I 13-07 I 12-18 I 9-97 I 6-90 i 4-27 I 2-36 I 1-18 


— | 0-92 i 


Calculated: Correlation 0-01 

1-85 | 3-27 | 6-02 | 8-92 | 11-01 | 11-71 | 10- 84 | 8-95 [ 6-64 | 4-52 [ 2-82 | 1-64 | 0-90 | — | 0-85 | — 


These give P = 0-46 and P - 0*86 respectively, the first being a good deal 
helped by the convention that the tail should not be carried beyond the point at 
which a single unit may be expected and the second much less so. 

As the empirical curve fitted from the actual moments has a P of 0*92, the 
second curve may be considered fairly good, depending as it does on a guess 
following on calculation. On the other hand a P of 0-46 with so few cases as 80 
is not particularly good, and as Prof. Pearson has pointed out to me the graph 
distinctly gives an idea of greater skewness than is represented by the no correla¬ 
tion curve. I do not, however, wish to contend that the circumstances attending 
the production of the sample actually conformed to the arbitrary conditions 
which I found it necessary to assume in order to simplify the analysis. But 
seeing that the fit is good and that with such a small sample even the divergent 
is not altogether impossible, I think it likely that there was some sort of 
correlation, though probably not that particular kind which has been assumed 
in this note. 


Conclusions 

1 . That the approach to normality of the distribution of means of samples 
drawn from a non-Gaussian population is delayed by the existence of correlation 
between the individuals composing the samples. 

2. That on certain arbitrary assumptions the constants of the new distribution 
can be found given the constants of the old one and r according to formulae 
given above. 

3 . That using the above formulae and choosing a likely looking value of r, a 
curve can be drawn to represent the sample in Drs Greenwood and White’s paper 
with fair likelihood. 
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APPENDIX TO MERCER AND HALL’S PAPER ON 
“THE EXPERIMENTAL ERROR OF FIELD TRIALS” 

[J. Agric. Sci. IY (1911), p. 128] 

Note on a Method of Arranging Plots so as to Utilize a given Area 
of Land to the Best Advantage in Testing Two Varieties 

The authors have shown that to reduce the error as low as possible it is necessary 
to “scatter” the plots. I propose to deal with this point in the special case when 
a comparison is to be made between only two kinds of plots, let us say two 
varieties of the same kind of cereal. 

If we consider the causes of variation in the yield of a crop it seems that broadly 
speaking they are divisible into two kinds. 

The first are random, occurring at haphazard all over the field. Such would be 
attacks by birds, the incidence of weeds or the presence of lumps of manure. The 
second occur with more regularity, increase from point to point or having centres 
from Which they spread outwards; we may take as instances of this kind changes 
of soil, moist patches over springs or the presence of rabbit holes along a hedge. 

Having made this distinction between random and regular causes of variation 
let me hasten to add that almost all causes of variation may belong to one or 
other or both of these classes according to the size of the plot in question. 

In any case a consideration of what has been said above will show that any 
“regular” cause of variation will tend to affect the yield of adjacent plots in a 
similar manner; if the yield of one plot is reduced by rabbits from a bury near by, 
the plot next it will hardly escape without injury, while one some distance away 
may be quite untouched and so forth. And the smaller the plots the more are 
causes of variation “regular”; for example, with large plots a thistly patch may 
easily occur wholly within a single plot leaving adjacent plots nearly or alto¬ 
gether clean, but with quite small plots one which is overgrown with thistles is 
almost sure to have neighbours also affected. 

Now if we are comparing two varieties it is clearly of advantage to arrange the 
plots in such a way that the yields of both varieties shall be affected as far as 
possible by the same causes to as nearly as possible an equal extent. 

To do this it is necessary, from what has been said above, to compare together 
plots which lie side by side and also to make the plots as small as may be practic¬ 
able and convenient. 

There is a reason, apart from the difficulty of cultivating very small plots, 
why the plots should not be made too small, and that is, that when two different 
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varieties are sown next one another the outside drill of each is under abnormal 
conditions and if it be counted in the plot may introduce an error which in a 
small plot may be quite substantial, but if it is not counted the space wasted by 
rejecting the outside drills of small plots becomes considerable. 

Let us suppose that the smallest practicable size of plot has been chosen and 
the land available for the comparison has been divided up into plots of this size 
and sown, chequer fashion, with seed of the two varieties. 

Obviously nothing that we can do (supposing of course careful harvesting) can 
now alter the accuracy of the resulting comparison of yields, but we can easily 
make different estimates of the reliance which we can place on the figures. 

For example, the simplest way of treating the figures would be to take the 
yields of the plots of each variety and determine the standard deviation of each 
kind. Then from published tables we can judge whether such a difference as we 
find between the total yields is likely to have arisen by chance. 

An advance on this is to compare each plot with its neighbour and to determine 
the standard deviation of the differences between these pairs of adjacent plots. 

From what has been said above as to the occurrence of “regular” sources of 
error it will be seen that such differences as these will be to a much larger extent 
dependent on the variety, and to a less extent on errors, than if the mere aggregates 
are compared. 

The standard deviation will therefore be smaller and the confidence which can 
be placed in the result increased. 

By a further device we can still further decrease the standard deviation and 
increase our certainty. 

For if, instead of harvesting the whole of each plot together, we divide each 
plot into two before harvesting (and that this can be done is clear from the account 
of the work done with the mangolds and wheat), then we get twice the number of 
comparisons, and the plots being half the size are comparatively closer together 
and the error of their comparison is reduced. 

But, it will be asked, why take all this trouble ? The error of comparing plots 
of any given size has been found by the authors of the paper, and all that has to 
be done is to apply this knowledge to the particular set of experiments. 

The answer to this point is that there is no such thing as the absolute error of 
a given size of plot. We may find out the order of it, be sure perhaps that it is not 
likely to be less than (say) 5 % nor more than 15 % without producing visible 
heterogeneity, but the error of a given size of plot must vary with all the external 
conditions as well as with the particular crops upon which the experiment is 
being conducted, and it is far better to determine the error from the figures of 
the experiment itself; only so can proper confidence be placed in the result of the 
experiment. 

The diagram illustrates the proposed method of arranging the plots. 



The Experimental Error of Field Trials 

The different shading represents the two different varieties. 
The firm lines represent the outside of the original plots. 

AA is part of the boundary of the experimental ground, 
given in the diagram. 
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part of which is 


The dotted lines show the further division made at harvesting 

Then the yields of the half-plots 1, 1: 2, 2:... etc. are compared together. 

The outside half-plots are neglected as it is usual to discard the edge of the field. 
I have determined the error of comparing plots of different sizes m this way 
both with the mangold and the wheat figures. 



Considering first the mangolds: 

The crop on half an acre in the present experiment was about 32,860 lb., and 
deviation of a single one-two-hundredth acre was found to be 
20-37 lb. Hence the standard deviation of half an acre made up at random from 
100 suchsmallplotswould be 20-37 x fi 00 or 203-71b.,and the standard deviation 
ot the comparison between two such half acres would be 203*7 x J‘2 or 287 lb. 

This would amount to 0-87 % so that one could not begin to be sure that a 
difference between two varieties of mangolds compared in this way ^one-two- 
hundredth plots arranged at random) until it amounted to say 2*6 %. 

But now suppose that the plots were each originally one-hundredth acre 
bisected at harvest and compared as suggested above. * 

T ’ le , n theactuai % ures f? iv en by the authors enable us to determine the stan- 
clard deviation of the difference between the half acre. 

It amounts to no more than 223 lb. or 0-68 %. I.e. although working with plots 
rwice the size up to harvest time we get the same accuracy with one acre of ground 
is would have been obtained with (0-87/0-68)2 acres or 1-65 acres on the first plan 

Now suppose the plots to be one-fiftieth divided into one-hundredths at harvest' 


4“2 
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Then I find the s.d. to be 274 lb. or 083 %. 

Similarly *th acre plots harvested as ^ths give a S.D. of comparison 289 lb. or 0-88% 

;; * ;; ;; sc T<SgJv ” ”> 329 ” 1 -°° 0/ » 

With such small numbers the difference between the last two cannot be taken 
as significant, but one would expect the square plot to give a worse comparison 

than the long plot. 

We may summarize the above results in the table below: 


Size of plot 

Y^th harvested as ^ths 


T l 5 „ square ^ 

i „ long yV 


Percentage s.d. Total area required 
of comparing to give a s.d. of 

acres 1 % in the comparison 

0-68 046 acres 

0-83 0-69 „ 

0-88 0-77* „ 

114 1*30* „ 

i.nn 1 - 00 * 


The corresponding figures derived from the wheat results are set out in the 
second table: 


Size of plot 


4h divided into sixths at harvest 


» T6 

taken at random 


q n in lb ofl Total area re- 

comparing Sl aS a q uired to give a 
two P half g °1 P p ° r l s.d. of 1 % in 

acres a acre the comparison 


0-50 acres 
0-74 „ 

1-37 „ 

1 - 10 * „ 
3*84* „ 
1*08 „ 


n 1 ^=^^ 

Both these tables show that in the actual fields which were measured, the area 
of land required to give a comparison between two varieties would increase rapidly 
as the size of plot increased if the same accuracy were required m the result 

Roughly speaking one-twentieth acre plots of mangolds would require at least 
twice as much land as one-two-hundredth acre plots in order that we may place 
as much confidence in the result, while one-fiftieth acre plots of wheat wou 
probably require more than twice as much as one-five-hundredth acre plots. 

Hence it is clearly of advantage to use the smallest practicable size of p ot. 

Also the advantage of comparing adjacent plots is apparent in these examples, 
since with the roots less than two-thirds of the land is required to give the same 
accuracy as random comparison and with the wheat less than half. 

Of course the comparison of whole half-acre plots would be liable to give error* 
of quite a different order: thus the South half acre of mangolds is 4*7 % bette: 
than the North half acre, while the West half acre of wheat is 8*3 % better thai 
the East half acre; such differences would be quite impossible if the half acre 
were subdivided into the smaller sizes of plots. 
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THE CORRECTION TO BE MADE TO THE CORRELATION 
RATIO FOR GROUPING* 


{Biometrika, IX (1913), p. 316] 

Using the ordinary notation, viz. n x — the number in the x array of y 's whose 
mean is at x p9 y Xp — the mean of this array, N the total number in the sample, 
and y the general mean of y, we have y 2 defined by the relation 


7)* = 


'SKpfep-y) 2 } 




W 


If 7} 2 is required to fit a regression curve to the actual observations as in Prof. 
Pearson’s original memoir “On the general theory of skew correlation and non¬ 
linear regression” (Drapers' Company ‘Research Memoirs, Biometric Series, n 
(1905)), no correction is necessary. 

But if we require a ratio which shall remain constant under wide variations of 
grouping and of number in the sample and which shall consequently be more com¬ 
parable from one sample to another, there are two corrections to be made. 

The first of these has already been given by Prof. Pearson (Biometrika , viii 
(1911), p. 256), and he has expressed it as follows: If y 2 be the value of y 2 actually 
found by the use of (i), and y 2 be the value which would be found from an infinitely 
large sample, then if k be the number of x arrays, 


y< 


y 2 — (k — 1 )/N 


(ii) 


i-(*-2pr 

But there is a further effect of grouping which has not hitherto been noted 
and which can be evaluated as follows: 

Suppose the x p array to be divided into elementary x arrays and let y p be the 
mean of the x p elementary array and n p its frequency. 

Then clearly the proper contribution of the x p array to y 2 is 

Sjnpiyp-y)*} 

This is equal to 

s { n p®x p -y+yp-y x f} i rof „ ^ , 0 . 0f „ _ w „. _ ^ 

y)}+^^fopiyxp y)(y& yx p )s 

+ 8{np(yp-y x? Y}]- 


No% 


Now y Xp — y is of course constant for this summation, 

£(%) = «*„ an< i S{n p (y p -y x )} = 0, 

* See “On the measurement of the influence of ‘broad categories’ on correlation” by 
Karl Pearson, Biometrika, ix (1913), p. 118. 






The Correction to he made to the 


therefore the contribution to )f 

n x P {y Xp -y ) 2 S{n p (y v -y Xp )} 

Na% + No% ' ;. m 

The first of these two terms is that which is obtained in the ordinary way, so 
the contribution of each array should be corrected by the addition of the second 
term and fj 2 itself by the addition of 

P ) 2 } ~j - . (iv) 

Now if Prof. Pearson’s correction (ii) has been made, we may take the point 
whose coordinates are (% p , y p ) to he on the regression fine, and if further we assume 
the regression line to be linear throughout the x p group and to be inclined at an 


angle of tan -1 r —^ to the horizontal, we have 


r p~ and Vi 


Hence (iv) becomes 


'r\8{n v {x. 


Now 8{n p (x p — x p ) 2 } is the second moment of the x p group about its own mean 
and when the distribution is known can often be approximately evaluated. 
Similarly, when the distribution is known r p can be estimated and the correction 
to 7} 2 calculated group by group. 

But by making certain assumptions we can very much simplify the work, and 
a practical test, in which the assumptions are not justified, will show the sort of 
errors which are introduced. 

The first assumptions are that the regression is linear and the arrays homo- 
scedastic. In this case of course r p is constant and equal to tj ; we are practically 
determining a value of r by the y method. 

The correction then becomes 

£^S[S{ np (x p -x p f}l 

or writing Na 2 X 2 = 8[8{n p (x p — x p ) 2 }] and H 2 for the raw value of y 2 after 
using Pearson’s correction, we get from (iii) y 2 = H 2 + rj 2 A 2 or 

H 2 , .. 


' (1 — A 2 )’ . v ' 

To obtain a value for A 2 we still require to postulate something of the nature of 
the distribution, and I propose to treat (i) of the case where the unit of grouping 
is constant and small enough for the frequency in each group to be considered to 
be distributed as a trapezium, and (ii) of the case where the frequency distribution 
is normal. 










Correlation Ratio for Grouping 

(i) First to find the second moment of a trapezium about its mean. 


Let z s and z 8 > be the ordinates forming the “walls’ 5 of the trapezium and let 
the group unit be h. 


Then y — z 8 + x is the equation to the “roof” referred to the “floor” 

and left-hand “wall” as axes. The area is clearly . 

The mean is at 

\ h yxdx = 2 m = * 25i±v; 

&(2 s + 2V)Jo + 2(3 Z 8 + Zf 




3 z 8 z s ' 


The second moment coefficient about the axis of y is 

2 [ h 2 ((Zg,~z 9 )k* z,h 3 ) h 2 3^ + 2, 

‘ ssr^)!“ i *f- + ir) - «■ tW• 

The second moment coefficient about the mean is 

W 3i8 a' + z « h2 ( 2z s- + z sY W [ z$, + 4Z„,Z S + Z* | Tfii 1/2 8 -« s A 2 ) 

6 '*. + V r (z a + z a .f 18\ (z a + z a ,f j 12 \ 3k + 2 ^)' 

Clearly when h is reasonably small (——- I is a quantity of the second order 

\z s + Zg>J 

and in this case 

A2 = l25|* . (vii > 

so that 


when the unit of grouping is uniform and small. 

(ii) When the unit of grouping is neither uniform nor small and there is no 
special knowledge of the nature of the distribution, we must needs fall back on 
the Gaussian curve to give us a first approximation to and z# for each group. 
In this case ,, 


1 -A 2 = NSr- 




and it is necessary to determine it, after fitting the frequency by means of 
Sheppard’s tables. 


* The suggestion of this formula I owe to Prof. Pearson. 
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The Correction to be made to the 


Finally, what correction, if any, is to be made for the grouping of y\ 

This will become more apparent from the alternative formula for y 2 , namely 

'v. 1 ~ No* ' f 

For the second moment of each array should be corrected by the subtraction 
of n s k 2 j 12, where k is the unit of grouping of y, so that 

= 1 S(y-y s f-Nh*jn 
1 Siy-yY-Nk*! 12 

= S(y-yf-S(y-y s f 
S(y — y) 2 — Nk 2 / 12 

Siy-ys+ys-W-^y-VsY 


= S(y-y*Y + 2 S(y-y s ) (y s -y) + S(y s -yf -S(y-y s ) 2 

N<r 2 

= 8{n s (ys-yf} 

Na 2 ’ 

since S(y s — y) 2 when summed for each individual becomes S{n s (y s — y) 2 } when 
summed for each array, and S(y — y s ) (y s — y) vanishes for each array. 

Hence there is no correction to be made for the y grouping except Sheppard’s 
correction for the standard deviation of y . 

I have tested the results on an instance given in Prof. Pearson’s original Drapers’ 
Company Research Memoir, namely the age and auricular height in girls, correlation 
table pp. 34 and 54. The means of the arrays in the full table are as follows: 



2 
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Correlation Ratio for Grouping 

These were grouped in seven ways, in three of which the groups were of equal 
width, and the other four give an attempt at equal frequency: the method of 
grouping is set out by means of columns headed in Roman numerals. The age 
distribution differs significantly from the normal, the constants being = 0*0013, 
/? 2 = 2*7101, but it would perhaps have been better to have selected a less normal 
distribution: still it represents the ordinary “cocked hat” statistics that tend to 
occur. 

The regression is certainly not very linear, the growth apparently ceasing at 
about 18-19. 

The values of rf (the raw value), H 2 (the value after using Prof. Pearson’s 
correction) and rf (the value after attempting to use the A 2 correction) are given 
in the following table: 


Number 

of 

grouping 

Number 

of 

groups 

i} 2 

V 



1 - A 2 from 
normal curve 

t 

V 

7) 2 

V 

I 

20 

0-09188 

0-303 

0-08414 

0-08489 

0-291 

0-08494 

0-291 

II 

10 

0-08657 

0-294 

0-08290 

0-08595 

0-293 

0-08510 

0*292 

III 

5 

0-07701 

0-278 

0-07535 

0-08786 

0-296 

0-08635 

0-294 

IV 

9 

0-08886 

0-297 

0-08510 

— 

— 

0-08953 

0*299 

V 

6 

0-08342 

0-289 

0-08136 

■—■ 

— 

0-08913 

0-299 

VI 

5 

0-08218 

0-287 

0-08053 

.— 

— 

: 0-08885 

0-298 

VII 

2 

0-06208 

0-249 

0-06159 

— 

— 

0-09739 

0-312 


It will be seen that the first three, with even grouping, are very close together, 
though the number of groups has been reduced from 20 to 5. Similarly, the next 
three are close together, and the last is again by itself. 

An examination of the way in which the groups are taken shows that the more 
the tail is bunched together the higher is the value found, and this is what would 
be expected in this particular case, since there is practically no increase of head 
height with age at the “old” end of the scale, whereas for purpose of calculation 
we have assumed a constant angle for the regression line. But it may be pointed 
out that 7 } varies (to the second place of decimals) only from 0-29 to 6*31 even if 
we reduce the twenty groups to two, an extreme proceeding which is never done 
in practice. 

At the same time the ordinary six or eight groups may be expected to give 
results a little too high when, as is usual, the regression line is curved. 




THE ELIMINATION OF SPURIOUS CORRELATION 
DUE TO POSITION IN TIME OR SPACE 

[■ Biometrika , X(1914),p. 179] 


If the Journal of the Royal Statistical Society for 1905 ,* p. 696 , appeared a paper 
by R. H. Hooker giving a method of determining the correlation of variations 
from the instantaneous mean ” by correlating corresponding differences between 
successive values. This method was invented to deal with the many statistics 
which give the successive annual values of vital or commercial variables; these 
values are generally subject to large secular variations, sometimes periodic, some¬ 
times uniform, sometimes accelerated, which would lead to altogether misleading 
values were the correlation to be taken between the figures as they stand. 

Since Mr Hooker published his paper, the method has been in constant use 
among those who have to deal statistically with economic or social problems, and 
helps to show whether, for example, there really is a close connexion between the 
female cancer death rate and the quantity of imported apples consumed per head! 

Prof. Pearson, however, has pointed out to me that the method is only valid 
when the connexion between the variables and time is linear, and the following 
note is an effort to extend Mr Hooker’s method so as to make it applicable in a 
rather more general way. 

If x i> etc., y lt y 2 , y s> etc. be corresponding values of the variables x and y, 
then if x 2 , x 3 , etc., y v y 2 , y 3 , etc. are randomly distributed in time and space, 
it is easy to show that the correlation between the corresponding nth. differences 
is the same as that between x and y. 

Let n D x be the nth difference. 

For i^ x = x 1 -x 2 , -J)% = xf-2x 1 x 2 + xl 

Summing for all values and dividing by N and remembering that since x ± and 
x 2 are mutually random S(x x x 2 ) = 0 , we getf 


* The method had been used by Miss Cave in Proc. Roy. Soc. lxxiv, pp. 407 et seq., that 
is in 1904, but being used incidentally in the course of a paper it attracted less attention than 
Hooker’s paper which was devoted to describing the method. The papers were no doubt 
quite independent. 

t The assumption made is that n is sufficiently large to justify the relations 
ST'ixmn-l) = S$(x)l(n- 1 ) = S^(x)/n and ^•" 1 (a; 2 )/(w- 1 ) = S£(x 2 )/(n- 1 ) = &t(x*)/n 
being taken to hold. 
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Again, iD xl D y = x^-x^ - x x y 2 + x 2 y 2 . 

Summing for all values and dividing by N, and remembering that x 1 and y 2 
and x 2 and y 1 are mutually random, 

^xD X \D ! P‘ . ( T' 1 Dy ^xy^x^v> 

• * r iD*iD v ~ ^xy‘ 

Proceeding successively, 

r nD xn D y ~ r n ~iDxn-iDy ~ ••• — ? xy . ......(1) 

Now suppose x 1} x 2 , x 5 , etc. are not random in space or time; the problems arising 
from correlation due to successive positions in space are exactly similar to those 
due to successive occurrence in time, but as they are to some extent complicated 
by the second dimension, it is perhaps simpler to consider correlation due to time. 

Suppose then 

x i = x i + bt i + ct\ + dt\ + etc., x 2 = X 2 + bt 2 + cl\ + dt\ + etc., 

where X v X 2 , etc. are independent of time and t l9 t 2 , t s are successive values of 
time, so that t n ~t n _ t = T 9 and suppose y 1 = Y 1 + b f t t + dt\ + etc. as before. 

Then + + + 

jA? = i D x — {bT + cT 2 + dT z + etc.} - t x {2cT + 3 dT 2 + 4eT 3 + etc.} 

— t\{ZdT + 6eT 2 + etc.} — etc. 

In this series the coefficients of t l9 1 2 , etc. are all constants and the highest power 
of t t is one lower than before, so that by repeating the process again and again we 
can eliminate t from the variable on the right-hand side, provided of course that 
the series ends at some power of t. 

When this has been done, we get 

nPx = rfix + a constant, 
nPy — kPy + a constant, 

^ ^nDxnDy ^nDxnDy ^ XY ’ 

and of course ^ r nD xn Dy> for n D x and n D y are now random variables 

independent of time. 

Hence if we wish to eliminate variability due to position in time or space and 
to determine whether there is any correlation between the residual Variations, all 
that has to be done is to correlate the 1st, 2nd, 3rd, ... nth differences between 
successive values of our variable with the 1st, 2nd, 3rd, ... nth differences between 
successive values of the other variable. When the correlation between the two 
nth differences is equal to that between the two (n+ l)th differences, this value 
gives the correlation required. 

This process is tedious in the extreme, but that it may sometimes be necessary 
is illustrated by the following examples: the figures from which the first two are 
taken were very kindly supplied to me by Mr E. G. Peake, who had been using 
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them in preparing his paper “The application of the statistical method to the 
bankers’ problem” in The Bankers' Magazine (July-August 1912). The material 
for the next is taken from a paper in The Journal of Agricultural Science (iv, 
1911) by Mercer and Hall, on the error of field trials, and are the yields of wheat 
and straw on five hundred ^ acre plots into which an acre of wheat was divided 
at harvest. The remainder are from the three of the Registrar-General’s Returns. 



I 

II 

III 

IV 

V 

VI 

Correlation between... 

and ... 

Sauerbeck’s 
index numbers 

Bankers’ 
clearing house 
returns per 
head 

Marriage 

rate 

Wages 

Yield of 
grain 

Yield of 
straw 

Tuberculosis death rate 
Infantile mortality 




Ireland 

England 

Scotland 

Raw figures 

First difference 
Second difference 
Third difference 
Fourth difference 
Fifth difference 
Sixth difference 

-0-33 
+ 0-51 
+ 0-30 
+ 0-07 
+ 0-11 
+ 0-05 

-0-52 
+ 0-67 
+ 0-58 
+ 0-52 
+ 0-55 
+ 0-58 
+ 0-55 

+ 0-753 
+ 0-590 
+0-539 
+ 0-530 
+ 0-524 

+ 0-63 
+ 0-75 
+ 0-74 

+ 0-35 
+ 0-69 
+ 0-74 

+ 0-02 
+ 0-51 
+ 0-65 

Number of cases 

41 years 

57 years 

500 plots 

42 years 


The difference between I and II is very marked, and would seem to indicate 
that the causal connexion between index numbers and Bankers’ clearing house 
rates is not altogether of the same kind as that between marriage rate and wages, 
though all four variables are commonly taken as indications of the short period 
trade wave. I had hoped to investigate this subject more thoroughly before 
publishing this note, but lack of time has made this impossible. 










TABLES FOR ESTIMATING THE PROBABILITY TltAT THE 
MEAN OE A UNIQUE SAMPLE OF OBSERVATIONS LIES BE¬ 
TWEEN -co AND ANY GIVEN DISTANCE OF THE MEAN OF 
THE POPULATION FROM WHICH THE SAMPLE IS DRAWN 

[■Biometriha , XI (1917), p. 414] 

In the last number of Biometriha (xi (1916), p. 277) Mr Young completes the 
table given in vol, x, p. 522 of the standard deviation frequency curves for 
small samples by working out the cases where the numbers in the sample are as 
low as two and three. 

In the course of his note he writes: “The smallest sample considered is that of 
n — ^ but samples of two and three are of occasional occurrence, especially in 
physical work, and now and again a value of the probable error of an experi¬ 
mental result is deduced from a set of two or of three observations.’ 5 

Further on he states: “It is evident that the probable error determined from 
a set of three observations is very untrustworthy and that when there are only 
two observations it is very much worse.” 

Now in my original paper (. Biometriha , vi, p. 1 [ 2 ]) I stopped at n = 4 because I 
had not realized that anyone would be foolish enough to work with probable errors 
deduced from a smaller number of observations, but now I too will complete my 
tables, which will I think emphasize the moderation of the second quotation 
from Mr Young’s note. 

Generally speaking there are two objects in determining the standard deviation 
of a set of observations, namely (1) to compare it with the standard deviation of 
similar sets of observations, and (2) to estimate the accuracy with which the 
mean of the observations represents the mean of the population from which the 
sample is drawn. 

The former purpose is served by the table which Mr Young was engaged in 
completing, the latter, which is by far the most common use of the s.d., by the 
table which I gave in my original paper and which I now propose to complete 
downwards by including n — 2 and n = 3 and to extend upwards as far as n = 30. 

In the tables the probability is given (to four places of decimals) that the m ean 
of a unique sample shall lie between — oo and a distance z from the mean of the 
population, z being measured in terms of the s.d. ( s ) of the sample. 

[By unique I mean to say that all the information which we have (or at all 
events intend to use) about the distribution of the population is given by the 
sample in question.] 
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64 Tables for Estimating the Error of the Mean 

To compare with the last column of the table (n = 30) I have given the corre¬ 
sponding probability calculated, from the nearest normal curve, namely the one 
with s.d. s/<J(n - 3) (not s/<J(n - 1) as is usually given), and this shows I think that 
for ordinary purposes Sheppard’s tables may be used with n > 30. 

With regard to samples of two it will be seen that odds of 9 to 1 are reached at 
a little more than three times the s.d., of 99 to 1 at a little more than thirty times, 
of 999 to 1 at a little more than 300 times, while 9999 to 1 is reached at in or about 
3000 times the s.d. ! 

Perhaps I may be permitted to restate my opinion as to the best way of judging 
the accuracy of physical or chemical determinations. 

After considerable experience I have not encountered any determination which 
is not influenced by the date on which it is made; from this it follows that a number 
of determinations of the same thing made on the same day are likely to lie more 
closely together than if the repetitions had been made on different days. 

It also follows that if the probable error is calculated from a number of observa¬ 
tions made close together in point of time much of the secular error will be left 
out and for general use the probable error will be too small. 

Where then the materials are sufficiently stable it is well to run a number of 
determinations on the same material through any series of routine determinations 
which have to be made, spreading them over the whole period. 

Thus an analyst may be determining the percentage of nitrogen in different 
samples of seed corn and wish to know the probable error of the determination, 
i.e. how accurately his figures give the percentage of nitrogen in a bulk of com. 

Let us suppose that he makes ten determinations a day for sixty days and that 
it is of some real importance to him to get a clear idea of his error; he will do well 
to get sixty different samples from the same bulk of corn and analyse one of these 
on each of the sixty days; unless I am much mistaken he will have a more modest 
idea of his infallibility than he had before he compared the sixty results together. 
He will also, in so far as his repeated sample is representative, get a close approxi¬ 
mation to the probable error of a single determination. 

In some cases it is not possible to obtain a sufficient bulk of material, and then 
it may be better to determine each result in duplicate, the repetitions being 
separated as widely as possible in point of time. Then the square root of the mean 
of the squares of the differences between corresponding pairs gives twice the 
standard deviation of the average of a pair, and if enough pairs can be taken and 
the determinations made on different samples this is a better method than the 
other, as the error of the sampling is better sampled. 

In the preparation of the tables a slight mistake was discovered in the second 
row of the odd numbers in the original table by Mr W. L. Bowie, to whom I am 
indebted for the calculation of the new figures. 
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AN EXPLANATION OF DEVIATIONS FROM 
POISSON’S LAW IN PRACTICE 

[.Biometrika , XII (1919), p. 211] 

In her paper on the Poisson law of small numbers, Biometrika , x, pp. 36 et seq., 
Miss Whitaker after a very interesting analysis of the various attempts which 
have been made to test Poisson’s law on actual statistics concludes that *‘A 
general interpretation based on a very simple conception seems needed for those 
demographic cases in which the law of small numbers appears far more often to 
correspond to a negative than to a positive binomial”. 

The following is an attempt to explore the general question of what effect 
various departures from the conditions which lead to Poisson’s law have on the 
resulting statistics, and especially which conditions lead to positive and which to 
negative binomials when the exponential might at first sight be expected. 

Poisson’s law has been applied to the occurrence of different numbers of in¬ 
dividuals in divisions of space or time: thus of yeast cells in squares of a haemaey to- 
meter, of deaths from the kick of a horse in Prussian Army Corps which may be 
taken as individuals occurring in divisions of space, or of suicides of children per 
year in Prussia which are individuals occurring in divisions of time. In such cases 
it has been asserted that if the chance of an individual being found in a given 
division is so small that when multiplied by the very large number of individuals 
the product is still a reasonably small number, then the frequency of divisions 
containing 0, 1, 2, ... r individuals will be given by the terms of the exponential 

Ae~ m jl+m + ^-K..+ 

where N is the number of divisions and m the mean number of individuals 
occurring in a division. 

For the above to be true it is necessary 

(1) That the chance of falling in a division is the same for each individual. 

(2) That the chance of an individual falling in it is the same for each division. 

(3) That the fact that an individual has fallen in a division does not affect the 
chance of other individuals falling therein. 

As to these three conditions (1) is seldom or never true. I propose to show that 
this is generally unimportant; unless the chances of some individuals falling in a 
particular division are relatively high the Poisson law holds; the tendency however 
s towards a positive binomial. 


BPS 



66 Explanation of Deviations from Poisson's Law 

Next (2) is comparatively seldom true except in the case of artificial divisions. 
The result of this, as Pearson has shown, is that a negative binomial fits the results 
better than the exponential. 

Lastly (3) is often untrue. It will be shown that if the presence of an individual 
makes another less likely to fail into a division the positive binomial, but if more 
likely, the negative binomial, will fit the figures best. 

We may start from the fact that if the chance of an event happening be q and 
of its not happening p, then the chances of its happening 0, 1,2, etc. times in n 
trials are given by the terms of the expansion of (p + q) n , viz. 

pn . n pn-iq : n ^ n p n ~ 2 q 2 : etc. 

As the moment coefficients of this series about the zero end of the range are 


p 2 = npq + n 2 q 2 , whence, p 2 = npq, 
the binomial is completely determined if we know v x and p% for 


p — ^2 „ „ i __p _ i anc [ n = 

F v’* 1 v x q 




and in particular the binomial is positive (i.e. n and q are positive) if pjv 1 < 1 & n d 
negative if P%\v x > 1. In the particular case when p^jv x = 1 the binomial becomes 
the Poisson exponential. 

It is therefore unnecessary to deal with higher moments than the second for 
the purpose in hand. 

Let us first consider the result of each individual having a different chance of 
falling in a given division. 

Let the chances of n individuals falling in a given division be q v q 2 , q z , ... q n . 
The chances of their not doing so are therefore (1 - q x ), (1 - q 2 ), (1 — &})> •••(!- 
and the chances that 0,1, 2,..., n of them will fall in that division are given by the 
various terms of the expansion of 

{(1 — q x ) + <?i} {(1 "-(Z 2 ) + # 2}{(1 ”# 3 ) + ^ 3 } • {(1 + 

i.e. by 

(1 — q-d (1 — ^ 2 ) * - * (1 + ~# 2 ) ••• (! ~~Qn)} 

+ %1 q 2 (1 - ? 8 ) • ■• ■• ( 1 - 0n)} ■+ • ■• ■• + ? 2 % • • ■• <lX 1 - <lr + 1 ) • ■• ■• ( 1 - ?n)} +•» 

the term Sfaqtfs ... q r (l ~q r +i) ••• (1 -?»)} g ivin g the chance that exac % r 
individuals will fall in the division. 

The sum of the above series is clearly unity, so that the first and second moment 
coefficients about the zero end of the series are given by two series of which the 
rth terms are 

r%i? 2 ? r (l-9Wi) - (!-«»)} and r 2 S{q 1 q i ...q r (l-q r+1 )... (1 ~2„)} 
respectively. 
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These series may be summed by rearranging them in the ascending order of the 
q products thus; 

^fei( 1 “9 f 2)( 1 -9'3)”-'( 1 -^)} = ^tei)~2/S r (g 1 g 2 )+ ... + (- l) r ~ 1 r.S(q 1i q 2 ...q r )-\- ... 

~h)! 1 “h)’• -t 1 ~= 2 %ih) + ••• + (— l) r ~ 2 r(r — 1) $(MV--2r) +••• 

*%i ?2 • • • &U - Qi+i )... (1 ~ q n )} = %? a . •. q t ) + ... 

+ (_ 1 r< ((-1) Ifr^jT 8(qiq2 ■ • • + • ‘• 

* * *. ..... . . 

•••?,(!-3r+i)...(l-?„)} =. r.S( qi q 2 ...q r ) + .... 

Adding these we get on the left v x and on the right 8( qi ) + a number of terms 
of the form r( 1 — 1 ) r_1 S(q x q 2 ...q r ) which accordingly vanish and we get 

v x = 8(q x ). 

In a similar manner it can be shown that 

= %i) + 2>S(g 1 g 2 ) J 

and other moment coefficients about zero can be found in the same way, but we 
are not here concerned with them.* 

If g, q 2 are the mean values of q and q 2 , obviously 



p x = 8(q x ) = nq. 

. ( 1 ) 

and 

= S(q x ) + 28(q x q 2 ) = S(q x ) + {S(q x )f-S(ql) 



= nq + n 2 q 2 — nq 2 , 

•. ( 2 ) 


— nq + n 2 q 2 — nq 2 — , 

. ( 3 ) 


/£ 2 = nq — nq 2 ~ncr 2 



= nq{l-q-fj. 

. ( 4 ) 

If now the distribution of chances is to be represented by the binomial (P + Q) N } 
then 

n _ ! _/*2 _ 1 nq(l- q -a-llq) 
v x nq 


+ 

ICW. 

il 

. ( 5 ) 


* The moment coefficients are : 

— npq ~ n q fi 2 , 

/V= n M(P ~ q) ~ 3n(P - q) + 2n Q /i z , 

/b = W{1 + 3(w - 2) pg} - n (7 4- 6 {w - 6) pg} a /i 2 + 12n(p - q) q fi z - 6n ^ 4 - 3n 2 ^1, 
vhere etc. are the moment coefficients of the g distribution and p~l — q. 


5*2 
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Since the original q* s are the chances of events happening they are always 
positive, so that the above expression must be positive and the binomial positive. 

If now we introduce the Poisson condition that q though positive is negligibly 
small (5) becomes in general zero, for <r q is usually of the same order as q, and in 
that case Poisson’s law holds in spite of the inequality of the original q’ s. If 
however cr 2 /g is appreciably greater than zero (as in the extreme case 

0-2 ^^ 

9i = i g 2 = ? 3 = ••• =?™ = ° when 

the distribution of chances is to be represented by a positive binomial. 

Next we have to consider the effect of disregarding condition (2), namely that 
the chance of an individual falling into it must be the same for each division. 

Let us suppose then that the f s are all different for each division, so that nq is 
also different. _ _ 

Then writing m for nq and m s m 2 , nq 2 for the means of m, m 2 and nq 2 taken over 


all the divisions, we get from (1) 

v 1 = m, .(6) 

from (2) v 2 = m + rn 2 -nf 

= m + mP+CTm — nq 2 , .(V 

— m+a^ — nq 2 * .( 8 ) 


As before, if (P+Q) N is the best-fitting binomial, 

Q = l H 
^ v 1 m 

Hence if cr ? 2 >nq 2 , which if there is any appreciable variation in m is probable, 
since as explained above nq 2 is generally negligible, a negative binomial will be 
found to fit better than the exponential. 

Clearly condition (2) is usually not fulfilled in the vital and demographic 
statistics; divisions either of space or time are generally governed by different 
environments which will vary the chances of an individual falling into them, and 
so we may expect that as a rule negative binomials will occur in place of the 
exponential. 

Finally, suppose that the presence of an individual in a division influences the 
chance of other individuals falling in that division. 

Clearly it may do so either by way of increasing the chance or diminishing it. 

* If we suppose that q does not vary with the individual but that nq (= m) varies with the 
division, the moment coefficients of the m distribution being written then the moment 
coefficients of the resulting distribution of divisions are as follows: 

H ™ + 

= m + 3 m y 2 + m /i s , 

/£ 4 = m + 3m 2 + (7 + 6m) m /* 2 + 6 m /i 3 + m /^4* / 
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If the chance he increased it is clear that we shall get for the same mean number 
of individuals per division a larger number of divisions containing high numbers 
of individuals and a larger number of zero divisions. In other words, for the same 
mean we shall get a larger standard deviation, so that /j,J Vl will be greater than 1 
and a negative binomial will fit better than the exponential. On the other hand, 
if the chance of other individuals is decreased by the presence of one already in 
a division jxjv x will become less than unity and the best-fitting binomial will be 
positive. The first of these two cases includes linking or clumping of events or 
bacteria, the second such a thing as the counting of large cells on a haemacyto- 
meter whose divisions are comparable in size with them. 

We have now shown that a population which might be expected at first sight 
to follow Poisson’s law 

(1) Will do so if the only deviation from the ideal conditions is that the chances 
o different individuals falling into the same division are not equal, as long as 
these chances are all small. 

(2) If in addition to this the chances of some individuals are large a positive 
binomial will fit the results better than the exponential. 

(3) If the different divisions have different chances of containing individuals, 
as is usual, a negative binomial will fit the results better than the exponential 
except in so far as (2) may interfere. 

(4) If the presence of one individual in a division increases the chance of other 
individuals falling into that division, a negative binomial will fit best, but if it 
decreases the chance a positive binomial. 

Generally speaking (3) is the operating deviation from Poisson’s conditions and 
accordingly most statistics give negative binomials. 

Finally, I should like to point out that the object of my original paper ( Bio- 
metnlca, vol. v [1]) was to give the user of the haemacytometer a guide to the error 
which he may expect from its use, and that the net result was that the probable 
error of his count was 0-6745 <JN , where N was the total number counted * and 
that if N be a reasonably large number tables of the probability integral may be 
used, otherwise the exponential (or better still go on counting). This result is not 
affected by shght deviations from the Poisson law, any more than slight deviations 
from the normal law affect our use of the probability integral tables. 


*, V ’ P r 356 ' The probabl6 6rror of mean is 0-6745 where m is the mean 
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AN EXPERIMENTAL DETERMINATION OF THE 
PROBABLE ERROR OF DR SPEARMAN’S 
CORRELATION COEFFICIENTS 

[Being a paper read to the Society of Biometricians and 
Mathematical Statisticians, 13 December 1920] 


[ Biometrika, XIII (1921), p. 263] 


In the British Journal of Psychology, ix, p. 96 * Dr Spearman suggested two methods 
of determining correlation, based on replacing actual measurements by ranks. 
As an illustration we may take the following purely imaginary example: 


Table I 


Individual 


Height 


6 ' 0 " 
5' 3" 
5' T 

6 / r 


Length of 
middle finger 
mm. 


12-8 

11-5 

100 

124 


Hank in 
height 


Rank in 
length of 
finger 


Instead of correlating the figures in the second and third columns of the above 
table Dr Spearman proposed to use the figures in the fourth and fifth columns, 
and to determine one or other of two coefficients: of these the first ( p) gives the 
ordinary correlation coefficient between the figures representing the ranks, and 
the second (R) was described as a “footrule” for correlation, i.e. a rough instru¬ 
ment which could he used by the unskilled. Dr Spearman also proposed to use 
R in cases where it was thought advisable to weight mediocre observations more 


heavily than extremes. 

The method of determining p and R was to take the difference D between the 
numbers representing the ranks, e.g. for A in Table I 

£> = 2-1 = 1 . 


* [Dr Spearman’s results were first given in a paper entitled 4 ‘ The proof and measurement 
of association between two things ” in the American Journal of Psychology , xv, pp. 72-10 . 
The dogmatic statements as to the accuracy of his methods in that paper are, I think, erro¬ 
neous, and he does not lay adequate stress on the fact that correlation of ranks is not a corre¬ 
lation of variates and may differ very considerably from it. The suggestion of considering 
the correlation of ranks is due to A. Binet and Y. Henri: see La Fatigue Intellectuelle (Paris, 
1898) p. 252, also VAnnee Psychologique (Paris, 1898), iv, p. 155. Their process is very 
obscure and they also do not appear to have realized that the correlation of variates is not 
that of ranks. K.P.] 
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Then 


8(D 2 

n(n 2 — 1) 


71 

(i) 


6 

and R = 1 .(ii) 

n*—l 

6 

where n is the number in the sample: in the case of R, 8(D) denotes the summation 
of positive differences only. 

Dr Spearman gave an empirical formula connecting R and p, viz. p — sin (^fR), 
but I do not suppose that he attached any very great importance to this. 

He further gave the probable errors of p and R for the case of no correlation as 


0-6745/^/W and 0-4266/^. 

In his memoir “ On further methods of determining correlation ”* Prof. Pearson 
investigated these coefficients for the case of the normal correlation surface and 
found the relations between p and R and r the ordinary correlation coefficient 
to be 

r — 2 sin I- pi ..(iii) 


and r = 2 cos-(1- 72)-1. ,.(ivy 

O 

Pearson further found the standard error of p to be for large samples 

{1 + 0-086/5 2 + 0-013p 4 + 0-002p 6 +...}, .(v) 

and of r p> i.e. r determined from p by (iii), to be 

1-0472 Dpi {i + o-042r 2 + 0-008r 4 + 0-002?- 6 +...}. ..(vi) 


He did not succeed in evaluating the error of R or of r R (i.e. of r determined by 
(iv)), but pointed out that just as in the case of r the aJu in the denominator is 
really j(n— 1). He also pointed out that R can only take values between + 1 and 
— 0-5 and that Spearman’s 0-4226/^/(^ - 1) does not imply that R is more accurate 
than p or r with their probable error of 0*6745/(^(% — 1), since R itself is smaller 
than p or r in about the same proportion. 

Since that time the use of p and R has become general among psychologists, 
especially in America, where they are preferred to r on account of the ease and 
speed with which they can be determined for small samples. 

For example, in a note on correlation in Employment Psychology , by H. C. 
Link,f a book written to urge the claims of Psychology on the devotees of “ Scien¬ 
tific Management”, the author mentioned three methods of determining correla- 

* Drapers' Company Research Memoirs, Biometric Series, iv, 1907. 
f Macmillan, 1919. 
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tion, p which is to he used for samples smaller than 30, JR for samples over 30 and 
r which, though acknowledged to he rather more accurate, is not to be used at all 
since it takes four times as long to calculate as the others. 

'Now to save time at the expense of accuracy is justifiable when, and only when, 
the time saved can he devoted to increasing the number of observations so as to 
obtain greater accuracy on the whole series, otherwise it will take longer to get 
equally trustworthy conclusions, and it seems to be of interest to investigate the 
probable errors of p and JR for samples of the size that the employment psycho¬ 
logist is contemplating. And here we may note that the saving of time only occurs 
when the sample is comparatively small; as it increases, the labour of grading 
becomes more and more severe till at some point in the neighbourhood of 40 it 
becomes quicker to use the ordinary product moment r if that be possible. 

It should perhaps be pointed out that there are many cases where it is possible 
to grade a sample for some character which is not capable of being measured on a 
scale, and it might be thought that in this case large samples could profitably be 
dealt with by the p or R method, but in fact it is just these scaleless characters 
which present the greatest difficulty in grading. 

We have then to consider the variability of p and R and of the derivatives r p 
and r R , determined from small samples, and it seemed worth while to use the 
material of a former sampling experiment so as to get an idea of how small samples 
depart from the results obtained by Prof. Pearson for ideally large samples. The 
material in question consists of 750 samples of four drawn from a population of 
3000 criminals whose height and left middle finger length give an approximately 
normal correlation surface with correlation 0*66. 

These are capable of being combined easily to give 375 samples of 8 and in 
addition there are 100 samples of 30, which may be taken to be a size of sample 
which is no longer quite “small”. 

Accounts of the former results were given in Biometrika, vi, p. 1 [2] andp. 302 [3], 
since which time the frequency distributions of the correlation coefficients of 
small samples drawn from normally correlated populations have been very 
thoroughly investigated by Soper, Fisher and the authors of the co-operative 
paper in vol. xi, p. 328 of Biometrika: it is hoped that some mathematician may 
be interested in the general solution of the problems raised in the present paper, 
which may then afford material for checking his results. 

When I came to apply the methods to my samples I found that, owing to the 
rather coarse grouping, there were a large number of ties, so that it became 
necessary to find out the right correction for ties. 

Prof. Pearson had discussed the question of ties and had suggested two ways 
of dealing with them. One way was to rank them all as if they were the highest 
number of the tie, which he called the bracket-rank method, and the other was 
to rank them all half-way down the tie, which he called the mid-rank method. 
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Thus the first way would rank 1, 2, 2, 4, while the second would rank 1, 2J, 2J, 4 
if the second and third of four individuals constituted a tie. 

Now the first would give different results according as we read the scale for¬ 
wards or backwards and also alter the mean of the set of numbers, so I have only 
tried to use the mid-rank method, for which I have found the correction which 
follows. 


then 


Correction of p for Ties 

If D = x — y, when x and y are any two variables measured from their means, 

D 2 = x 2 + y 2 — 2 xy. 

Summing for all n samples and dividing by n, 


Z(D 2 ) 


- 


n 


&X 4 " (Ty ' 


2iT (T (T 

' xy J x J y> 


xy 


«-T)/ 


2(T X (Ty. 


(vii) 


If now x and y are the first n numbers, then 
1 


_ ^-2 


- x sum of squares of first n numbers — ^ 
(n+l)(2n+i) (n+ 1) 2 


j'sum of first n numbersV 


n 


n i 


12 

Substituting in (vii), we find 


n* 


1 Z(D 2 )\ln 2 -1 


n 


)t- 


n(n 2 ~ 1) 
6~ 


£(D 2 ) 


n(n 2 — 1) 


(viii) 


,(ix) 


Now suppose that there is on the x side a tie of t in number from q to q +1 — 1. 
Using the mid-rank method we substitute for each of the numbers 

i . 2 q -f -1 — 1 

q, q+l, ... q + t—l their mean -. 

Z 

Hence in finding cr 2 the mean is unaltered, but in the sum of the squares 
g 2 + (# + 1 ) 2 + . •. + (g +1 — 1 ) 2 is replaced by ——. 

Hence cr 2 is smaller by 
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This is equal to 

i|i2 2 + 2g{l + 2 + ... + («-l)} + {l 2 + 2 2 +... + («-l) 2 }-ig' 2 -g«(«-l)-^^ 2 J 

_ lf (t-l)f(2f-l) ift-l) 2 ] 

n\ 6 4 j 

12 n 9 

2 _n 2 —l t(t 2 — 1) 

• • = —[2 12 W ' 

/t(t 2 _1)\ 

This is clearly additive for any number of ties, so that if T x = E I —^— I 

summing for all the ties on the x side and similarly T y for the y side 

a n 2 -l T x , 2 ^ 2 -l T v 

at = - —-- and at. = —— --, 

x 12 n v 12 n 


and substituting in (vii). 


P ^xy 


tpj- {Tx+Ty) -?m 

6 n v n 


n 2 -1 2T x \ln 2 -l 2 T y 


6 % / \ 6 w 


w(w 2 — 1) 


-K+rj-W) 




n(n 2 — 1) 


(T x + T y ) — 2(D 2 ) 


{T x +Ty)\ / 1 - 


n(n 2 — 1) 


So that if T x and T y do not differ appreciably 


(T X -TJ 


n(n 2 — 1) 


p = 1 


E(D 2 ) 

7! ^ffi-{T X +Ty) 


In estimating T x or T y each pair contributes \ 


(T X +Ty) 


quartet ,, 5, 
quintet ,, 10, 

and so on. For example, if the x ranks for a sample of 10 were 

1, 21 2J, 5, 5, 5, 81 81 8£, 8£, 
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T x would be | + 2 + 5 = 71 and if there were no ties in the y ranks p would be 

165 — 7j- 3(1? ) _ 1671-3(1?) 

V{(165—15)165} '^(150.165)'’ 

and if we were to take it as 1 - ^4 the error would come in the third significant 
place of decimals. 

In determining p for my 375 samples of 81 found that much-tied samples usually 
gave low values of p, and it occurred to me that although undoubtedly equation 
(x) gives the true value of the correlation of ranks, yet it might be that the loss 
of precision due to ties would give low values for the correlation. To test this I 
doubled the width of my unit of grouping first for one variable and then for the 
other, so that I got three values of p for each sample: 

(i) Converting the original figures into ranks. 

(ii) Using coarser grouping on one side and the original grouping on the other 
before converting into ranks. 

(iii) Using coarser grouping on both sides. 

An example will make my meaning clearer. 


(1) 

Original figures 

(2) 

x grouped 
coarsely 

x, < 3 > 

Both grouped 
coarsely 

J' Ranks 

(1) 

(2) 

(3) 

* y 

Putting 
+ 1 and 

0 as 4, y 

etc. 

x y 

x y 

x y 

x ;■ y 

0 +3 

-2 0 
+ 3 +3 

-1 -2 
+ 1 +3 

+ 1 +2 

+ 3 + 4 

0 +1 

+ 4 +3 

-1* 0 

+ 14 +3 

-14 -2 

+ 4 +3 

+ 4 +2 

+ 14 +4 

+ 1 + 1 

] 

( 

+ 4 +24 

-H + 4 
+ 24 +24 

-H -H 

+ 4 +24 

+ 4 +24 

+ 24 + 44 

+ 4 +4 

Pairs ... 

Priplets 

Quartets 

54 3 

8 7 

14 3 

7 8 

34 3 

3| 5 

14 1 

54 6 

3 — 

_ 1 

44 3 

74 7 

14 3 

74 8 

44 3 

44 5 

14 1 

44 6 

2 — 

— 1 

1 — 

44 34 

74 64 

14 34 

74 8 

44 34 

44 34 

14 1 

44 64 

2 1 

1 1 


Here originally T x = 11 and T y = 2 and nffiffi-l) _ ^ + = m . 

After grouping x coarsely T x = 6 and T y = 2 )( = 76 

After grouping both coarsely T x = 6 and T y = 5J „ = 72 i 

andp will be found to take the values 0-832,0-869 and 0-828 in succession. Working 
in this way I determined three values of p for each of the 375 samples and deter¬ 
mined the mean, standard deviation and mean T y ) for each of the three 
series of 375. These results are given in Table II. 

Here an increase in the correction to be made for ties from 3*82 to 9*04 has 
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made a difference of 0-01 in the mean value of p, the probable error being about 
0-015, and a still less appreciable difference in the standard deviation. It is, 

Table II 


Original series . 

x grouped coarsely 
x and y grouped coarsely 


Mean p 

CT 

Mean T x 

Mean T y 

Mean 

0-5798 

0-2887 

1-92 

1-90 

3-82 

6-57 

0-5798 

0-2903 

4-67 

1-90 

0-5696 

0-2874 

4-67 

4-37 

9-04 


I think, a fair inference that the correction is applicable to the series in question, 
and the reason for the observed low values of p in much-tied samples is to be 
sought elsewhere * But it will be asked “what if no correction be made for ties? 55 
The answer is that the mean value of p will rise as the ties become more numerous 
and the s.d. will fall. Thus Table II would become Table III if no corrections 

were made. 

Table III 


Original series . 

a; grouped coarsely 
x and y grouped coarsely 


Mean p 

cr 

Mean ( T x + T y ) 

0-602 

0-2677 

3-82 

0-616 

0-2887 

6-57 

0*622 

0-2414 

9*04 


At first sight this may appear to be highly advantageous, since the mean value 
approximates more nearly to the value which would be obtained from a large 
sample and the s.d. is smaller. A little reflexion will show, however, that the means 
of the p ’s of all populations would be subject to the same rise and that in fact the 
p of one population is no more differentiated from the p of another population 
than it is when corrected, while the mean value when corrected is constant over 
a fairly wide range of ties. If the correction is not made p can be cooked up to any 
required value by increasing the ties. 

The fact is that as soon as there is a single tie, uncorrected p can no longer take 
all values between +1 and - 1 and if one of the scales be reversed the correlation 

instead of being -p becomes -P + ^Zf y We are therefore forced to use the 

6 

* The low value of p for much-tied samples is due to the fact that a much-tied sample is 
as a rule one in which the s.d. of the original variables is low. 

Now as a matter of experience I find that of samples drawn from a normally distributed 
population those with s.d. above the average tend to give high and stable values of the 
correlation coefficient, while those with s.d. below the average tend to give low and variable 

The form of the correlation surface for variables <? x and r xv is of considerable interest to 
those who have to deal with small samples and merits the attention of mathematicians. 
I hope to deal with the experience obtained from my samples at some later time. 
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correction which after all gives us the distribution of p that we should get from 
ideal material containing no ties. 

To see what happens when ties are carried to an extreme I determined p from 
the original table of 3000 entries ( Biometnka, i, p. 216) and from the same table 
condensed to six groups each way by using a 4 in. scale of height and 0-8 mm. 
scale of finger lengths. 

In the first case p = 0*637 giving r p 0*655 and in the second 0*557 with r p 0*575. 
There seems therefore in extreme cases to be a tendency for the correction to give 
too low a value of p. 

Correction op B for Grouping 

In Dr Spearman’s original paper B is defined as 1-4^1 when - 2 ~ 1 is taken 

n 2 — 1 6 


as the average value which 8(D) assumes. 6 

The simplest way to see that this is the average value is to write down all the 
possible D’s thus: 


1 1 

2 2 

3 3 

4 4 


1 

2 1 

3 2 1 


(n-l) (ti-1) (ti-2) (ti-3) (ti-4) ... 1 

n n (n-l) (n-2) (ti-3) ... 2 1 

Here the two columns on the left are composed of the first n numbers. The 
third column is formed by subtracting the top number of the first column from 
all the numbers in the second column in turn, the fourth by subtracting the 
second number from all numbers which give a positive remainder, and so on. 

Thus the numbers in the second column could be arranged opposite the 
numbers of the first column in n ! ways. 

And in (n— 1)! of these arrangements any given pair will occur. 

Hence the average value of 8(D) will be 


{t 1 + 2 + ... + (ti - 1)] + [1 + 2 + ... + (n-2)] + ... + (1 + 2) + 1}, 

.*. Average value of 8(D) = ^ j_ ( -- Z , 1 ) + 1 ?..~ l)(n—2) + ^ + 2^3 + ]U2 




n) 


1 j n(n+ 1) (2n + 1) n(n + 1) 
2n\ 6 2 


(n+ 1) 
12 


{2n+ 1-3} 


7l 2 -l 


,(xii) 
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If we now substitute in the second column ties instead of consecutive numbers 
we can find out what effect ties will have on the average value of S(D). As I can 
see no general way of proving the results I propose merely to state my results as 
follows: 

(1) A tie of t on one side which is opposed by no ties on the other side will 

dimmish —— by -1——' if t be odd and by -— if t be even. 

6 J 24 n 24cn 

(2) Overlapping ties on opposite sides interfere with the above simple rule, the 

^2 _ | 

total to be subtracted from —-— being increased or decreased according to 


Table IV. 


Table IV 



As an example of the use of Table IV, suppose a set of eight ranks to contain 
on the x side a tie of 5 centred at 3, i.e. let the x ranks be 3, 3, 3, 3, 3, 6, 7, 8, and 

let the y ranks have a tie of 4 centred at 2|-, i.e. let the y ranks be 2J, 2f, 2|, 2J, 

8 2 — 1 . 

5, 6, 7, 8. Then the amount to be subtracted from —-— is firstly § (for the 5 tie) 

+ § (for the 4 tie) + f (from Table IV) = 1J. Had the y ranks been 1, 3J, 3J, 3J, 

3|, 6, 7, 8, the correction would be the same, but if the y ranks were I, 2, 4J, 4J, 

4J, 4J, 7, 8, the correction would be § + f — f = f, 

and if 1,2, 3, 5 J, 5-|, 5J, 5}, 8, f + f - $ = f , 

and if l,2,3,4,6i,6|,6|,6i f + f = f. 

It is only with very small and much-tied samples that the correction is appreci¬ 
able. 














Giving Frequency Distributions of Various Correlation Coefficients from 375 samples of 8 
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Discussion of the Frequency Distributions obtained 

Tables V and VI give the frequency distributions of r, determined ^ 

pardi^conectiorrs bn grouping, of p and of B and their derwatrves (from equations 

(ill) and<iv» “ d ■ determined withont Sheppard’. eorreotion. 

: «r r —« o- «* o <";™; 

“ ““ ,te, " gi ,he 

median. by Sb.ppe.d-. formuia emr-^j, where dl i. the fr.p.e.ey *» tto 

“small” cells This probably suffers a good deal from the coarse groupmg which 

mates it^ecessary to divide th e ° en ^ r ® 8 rol ^P^|“ ^ the^vw:^widespread of all 
The most remarkable thing about these tables is th ^ P ^ ^ 

the distributions. There is of 00 ® se n ° ^ ng be beneficia l for all who try 
Ft.jr.Hng that an exammation of these tames may oe 

deviationiphftt asir rnattm of interest I have compared line. 2-4 of Table V with 
line 1 by the f test with the following results: 


Table VII 


25 groups 


L6 groups 


r with Sheppard’s corrections 
f without Sheppard’s corrections 


r p actual 
r actual 

R 


0-18 

0-67 

0-000,064 

0 - 000,002 


0-17 

0-66 

0-01 

(say) 0-000,002 


that no group in line 1 was less than 10. decisive for considerabl 

-—* 

»srv4= u _, 
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Tabm VIII 

Certain Constants of the Fluency Distributions of Various Correlation 
Coefficients derived from 375 samples of 8 


1) r calculated from co¬ 
operative paper ■ 

2) r actual using Shep-1 
pard’s corrections | 

1) r actual using no cor-) 

’ rection for grouping | 

0 r p actual . 

>) r R actual ... 

>) p actual 
0 R actual 


Mean 

0-631 

0-624 ±0-010 

0-614 ±0-010 

0-586 ±0-010 
0-566 ±0-011 
0-580 ±0-011 
0-407 ±0-008 


S.D. 


0-250 

0-274 ±0-007 

0-271 ±0-007 

0-291 ±0-007 
0-309 ±0-008 
0-289 ±0-007 
0-237 ±0-006 


Coefficient ^ umber of samples f 

of I? + d to glve as squired to give as ! 
variation ac °uracy as great accuracy as 

100 samples of (1) | l 00 samples of (2) 


39-6 

43- 9 ±1-3 

44- 1 ±1-3 

49-7 ±1-5 
54-6 ±1-7 
49-8±l-9 
58-2±l-9 


100 

120 ±5-9 

117 ±5-8 

135±6-7 
153 ±7-5 


100 

98 

113 

127 


Table IX 

Certain Constants of the Frequency Distributions of Various Correlation 
Coefficients derived from 100 samples of 30 


) r calculated from co-) 
operative paper | 

) r actual using Shep-] 
pard’s corrections i 
) actual ... 

1 actual ... 

1 r actual from median i 
fourfold division 

nB h 


= cos 

p actual 
R actual 


A+B 


Mean 


0-653 

0-661 ±0-007 

0-639 ±0-008 
0-638 ±0-008 

0-609 ±0-012 


0-624 ±0-008 
0-428 ±0-007 


S.D. 

0-109 

0-101 ±0-005 

0-113 ±0-005 
0-122 ±0-006 

0-183 ±0-009 


0*116 ±0-006 
0-100 ±0-005 


Coefficient ±±S°, fsam P Ies |Numberof S ampIes 
0 f required to give as I r^nmWi 

variation 


required to give as 
great accuracy as 
100 samples of (1) 


required to give as 
great accuracy as 
100 samples of (2) 


16- 7 

15-3 ±0-7 

17- 7 ±0*9 
19-1 ±0-9 

I 

30-1 ±1-6 


18-6 ±0-9 
23-4 ±1-2 


100 

86 ± 8-2 

108 ±10-3 
J 25 ±11*9 

282 ±25-1 


100 

125 

146 

328 


" T * W “ V “ d VI *» J a the 

.,ti t r l “ ° f »”'« >» 

etermined (1) on the theoretical bask of ^ ! lCCUrate metll °ds as 100 samples 

am by using the rank methods First h™ tlm6 mUSt b ° SaVC<i m order to 
larked difference between the theor r b ° Wever ’ we may note in Table VIII the 
roduot-moment method. the ° retlcal S - D ' aild tba t actually obtained by the 
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I n..nbn«, .hi, ** «.WJ *» *»« 

„,h„ on.,.. .nd wU* «nn.t be 

„„ P 1„. The ,hgh. p«t the „»« i» «»l»w» 

have helped to a very small ex ’ ,1 + Q be high is due to those 

vaI ».,o t , which — g.o.Pin* ™ 

samples which have low s.b. , + i distribution of r will be 

““»bl. IX, on the other hood, th. oe.».l h., .£.» ^"the 

rr»- 8 o“^“.« - - * - * “-■» ° f 30 

probably so. found are - n the oage of samples of 8 

“ " K r to *” ^ 

probable « „ indeed in .hi, invention Sh.pp.rd'. 

JZ ZSL formula give, . 

- “ H* S.».- Whd. ?£££. to .hot« 

“LTteTthtletth.t.eh.v.onl^^^ 

in this case. . ,, VTTT j jx lies in column 4 

sufficiently fine grouping by the produ acce pt my explanation o 

“ f ZlXZTr .ST 5 a, £ « * ~™““ ““ 

, , „ 2J V( 1 - r») f a±X 1 ig Vinwsver. rather highe 

. The s.d. calculated from the formula cr, - , N \jy 2 j 

being 0-191 if r be taken as 0-66 ’^eenpubtohed^See, however, B« 

m *■ W * S1 —* Pm ' ^ A ’ ° XCI 

pp. 147 et seq . k.p.] 
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the numerator and denominator of the fractions from which the figures are 
calculated. They must, however, be larger than the probable errors in column 4. 

In any case there is a strong indication that with samples of 8 the loss of 
accuracy due to the use of r p instead of r will practically always more than counter¬ 
balance the gain of time in calculation. Either method is, however, so little to 
be depended upon for a single sample of very small size, except as the merest 
indication, that very little is lost by the use of r p . If, however, a number of small 
samples can be averaged so as to obtain a coefficient of some value, the product- 
moment method should be used when possible. 

With samples of 30 the 8 % more samples required compares fairly with Prof. 
Pearson s 10 % more for large samples, but seeing that the particular sample of 
100 gave too low a value for <r r> the value of <r r which must be correlated with it 
is likely to be low also and the 8 % may easily be 18 % or more. 

; In any case it would very seldom pay to have to collect 8 % more samples of 
30 even if one could save 8 % of the time on samples of that size. 

In both tables there is a considerable loss from the use of r B instead of r p) since 
from 13 to 16 % more samples would be required of the former to give the same 
accuracy as the latter. The gain in calculation is not very appreciable, since most 
of the time is spent in ranking the samples. Dr Spearman prefers E to p at times 
because less importance attaches to outlying samples, but as the extremes of 
small samples tend to be outliers even in normally correlated material owing to 
the phenomenon to which attention was drawn in Galton’s Difference problem,* 
it seems to me that as much weight as possible should be given to them. 


To Combine Two Methods oe Determination 

At an early stage in the investigation I hoped to be able to combine r and r 
~jO get a value less subject to error than either. Curiously enough Prof. Pearson in 
ais editorial in the last number of Biometrika gives the equations which I pro¬ 
posed to use for the purpose (p. 7 (29)). 

As they are perfectly general I will state them in a slightly more general form. 

If x and y be two estimates of any quantity obtained in different ways, then a 
quantity z can always be found which will have a lower error than either of them, 
mless x and y are perfectly correlated. 


Thus 

nd 
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' xy^ x u y 
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(xiii) 

(xiv) 


* Biometrika , i, pp. 385-99. In this connexion it is of interest to note that the correlation 
urface of ranks is not an elliptical hill as is the normal correlation surface but two com- 
aratively steep ridges joined by a saddle, the ridges having a skew section. 
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Giving Correlation (0*885) between r {no corrections) and r p for samples of 8 
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Giving C&rrdatim (0-903) between p Calculated in Original Grouping (p x ) and p Calculated in Groups twice as coarse (p 3 ) 
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jr is requested to note that the subranges here are not the same as in Tables \ , VI and 
only table which contains a perfect correlation coefficient which occurred m the p s series. 
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In the case of the samples of 8, x may be taken as r without Sheppard’s correc- 
tions and y as when we have 

<r% - (0-271) 2 = 0-073,441, o x o y = 0-078,861, 

= (0-291) 2 = 0-084,681, 
r xy = 0-885, 

and hence from (xiii), 

z = 0-804r + 0-196r p 
and (r z = 0-270, 

i.e. there is no appreciable gain in our case since o-,. is 0-271. It may be that with 
a ower value of the population correlation the gain would be greater, but on the 
lf r had been deter mined for very fine grouping of would have been 
0-0625, the contribution of r p to 2 would have been practically negligible, and the 
gam in accuracy by the use of z less than that found. There is, however, another 
case where the above formulae might be applied, namely to the values of p 
obtained from the original grouping and those from coarse grouping 
These are given in Table II from the first and third lines of which it appears 
that o H and cr H may both be taken as 0-288. 

In this case of reduces to ff p(^ + r p t p 3 ) 

2 

and as r pip3 = 0-903, <r a = 0-281. 

This is somewhat more encouraging, but the process is rather troublesome and 
could only be applied to cases where there is a proper scale. If, however, there is 
a proper scale greater accuracy could be obtained by the product-moment method 

with very little more trouble (since we have now to make two calculations to 
find p). 

We may therefore conclude that as far as this sampling experiment may be 
taken as typical: 

{ \ } ^ Vhere , t 1 h f unit °f grouping is small (say < i the s.d.) the product-moment 
method should be used if the most is to be made of the time and statistics at our 
disposal, however small the sample. 

(2) Where a coarse grouping has to be used, the mean value of r will fall 
below that calculated from the co-operative paper (Biometrilca, xi, pp. 328 et seq.) 
an t e s.d. will rise. For small samples Sheppard’s corrections will approxi- 
mately correct the former but will increase the latter still further. Indeed it is 
possible that for very coarse grouping p might vary less than r. 

(3) For this, or any other, purpose ties should be dealt with by one or other of 
the formulae in equations (x) and (xi) of this paper. 

(4) Where one or both variables can be ranked but not scaled, as frequently 
happens in some kinds of work, or for what Prof. Pearson has called “purposes 
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° f P T be determined with advantage and may be considered the natural 

method to adopt. 

(5) In such cases it should be borne in mind that for small samples the dis¬ 
tribution of p is similar to the distribution of r, but that the mean, even of r , 

is ower than that of r and the s.d. greater, by amounts which doubtless depend 
on the population correlation. 

(6) i? and »•* are not worth determining in serious work; their use should 
erefore be confined to the elementary statistics for which its author intended R 

(7) It is interesting to observe that Sheppard’s median division fourfold table 

wn-f ?i“ f ° r Sm , aI1 samples a mean value ver 7 much below the population value. 
While this is only what one might have expected, it may in this case be due to 

the coarse grouping which preventedmefrommakinganaccuratemedian division. 

(8) Ihe following problems might be of interest to mathematicians; 

(a) The determination of the form of the rank correlation surface. 

( ) The determination of the frequency distribution of p for small samples 
drawn trom a normally distributed population. 

(c) The determination of the nature of the correlation surface when a standard 
evia ion is taken as one variable and the correlation coefficient as the other, both 
emg determined from small samples drawn from a normally distributed 

norm afinn u 
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ON TESTING VARIETIES OE CEREALS 

[Being a Paper read to the Society of Biometricians and 
Mathematical Statisticians, 28 May, 1923] 

[Biometrika, XV (1923), p. 271] 

Object of Experiments 

The object of testing varieties of cereals is to find out which will pay the farmer 
best. This may depend on quality, but in general it is an increase of yield which is 
profitable, and since yield is very variable from year to year and from farm o 
farm it is a difficult matter upon which to obtain conclusive evidence. 

Yet it is certain that very considerable improvements in yield have been ma e 
as the result of replacing the native cereals by improved varieties; as an example 
of this I may cite the case of Ireland, where varieties of barley have been intro¬ 
duced which were shown by experiment to have an average yield of 15 to 20 / 0 
above those which they replaced. This represents, probably, a gain to the country 
of not less than £250,000 per year. As the cost of experiments from the com¬ 
mencement to the present time cannot have reached £40,000 the money has been 
well spent. 

Origin op Varieties 

In the first place the ordinary cereals, wheat, barley, oats, and so on (maize is 
not here considered), are all self-fertilized and occnr in races broadly distinguished 
by different botanical characters—Potato Oats, Rivett Wheat, Chevalier Bar ey, 

and so forth. . 

Besides these botanically distinguishable races, it is possible to pick out strains 

from commercial seed which differ from one another in all kinds of ways: time ot 
ripening, percentage of nitrogen, yield, etc., although botanically the same. 
Many of these strains have been selected from time to time, certainly from the 
end of the eighteenth century up to the present time. 

Finally there are hybrids, the result of deliberate crossing, and the selection 
of the best individuals out of the many thousands which may be grown m three 
generations is one of the more difficult problems with which the plant breeder 
has to deal, but it is only after he has made his preliminary selection that his 
hybrids concern the experimenter who is testing varieties. 

Owing to the fact of self-fertilization, the various races, strains, and even to 
a large extent the hybrids, remain practically constant from year to year if once 
pure seed has been obtained. 
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Chxei’ Soitbces of Ebbob 

o« ?,TT in ,, he f “* a “* lhe “ a *• >■“'«» 

vary to the eye it is foZ t ^ ^ U “ f ° rm ’ b ™<^r little it may 

yard, and even from inch to fechTlf ^ ^ . fr ° m aore to acre but from yard to 
the ordinary formulae for comV • “* yanatl °" 18 an y t,lill « but random, so that 
randomness are even less appKcawHhTuslal ° bSerVati ° n which are based on 

w ‘ WiUhardly aff6Ct eXp “ fe ba ™ d out 

which we have to investjlte^dit^ Th**.****** than the Terences 

their interpretation when completed are n r" > + P ^T® ° f 6X P eritaents «d 
has been written. P ^ are q^te straightforward that this paper 

Methods of Operating 
There are broadly speaking, two methods of operating: 

Ploughs, seed ° rdlnai7 agtiOTltural “'P’ements, 

»ii i ,p * i * * nd *“»•■ - . ™, 

t J» ^ * 1 “ B ™» “>• f»™«, wh„ ,i„„ 

is to this extent tight that lame scafeco 3 v* 7 S ° m6 attentl0n to the results : he 
in a wire cage, and in fact lomeTait u°T “ * educed 

scale have not done as well in the ^ ?* VG C ° me ° Ut Wel1 ° n sma11 

scale work, thenl^^v tllou 8 h - not at aU common. Large- 

the large scale that variety & ^ °“ 

Large-Scale Work 

onib"hlS;«™fJf l£r °iTf ent ' 0 "* d 

barley to grow in that country. d 61 ° Ut 16 best variet y of 

The experiments lasted six years vide TaHIa t o a j 
varieties were tested; only two, however Archer ’and aS ^ **““* 8676,1 
through from start to finish as the nthml 1 ( * oId thorpe, were carried 

found to be inferior or were not« ,, W ® re eitb er dropped when they were 

seed was ordinary commercial sew^and the ° 1°+^^ &StPla ° 6 ' The 0li S ina( 

- -- - - • 
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should form the raw material for further manufacturing experiments, 
awis^precaution, as has been found recently when a barley in other ways among 

the best was found to be quite unsuitable as ’> ialtm g rnateria ' 

The produce of the plots was all valued (m those days-iOOl-O-value w 
f^Ldy from yjto y«r>, and «. a method of 
quality, but although the quality varied very much from one farm to anot 
there ^was g.n.r^y only a .mall different. "“J* 

varieties grown on the same farm m the same season. T 

plots were grown and at 

r"wtd. H.r, l~~n”.i,h «„ 51 S l.t. of Areher. 

and the corresponding 51 plots of Goldthorpe C)0 and 2 Us. 

The value per acre then, of the 51 Archer plots varied between 90s. ana 
wil a mS l“*: and ’a standard deviation of 33-6, The value per acre o 
the Goldthorpe plots varied between 99s. and 230s. with ” ° f gi ht 

standard deviation of 33s. The difference therefore, was 12 " 

this hardly appears significant, for had the Archer and Goldthorpe plots be 

ndlpendent ,^L standard deviation of their difference would have been about 6 5^ 
THs brings us to the first principle of all agricultural experiments viz. that 
onl comparative values are of any use. If we are told that on a certain farm J new 
XXarley produced 30 cwt. to the acre, we admit that the orop xs goodbu 
Ire not much interested. If, in addition, we hear that Archer gave 25 cwt to the 
acre on the same farm, we begin to take notice; for it is some evidences to^ 
value of the new variety, and it is the difference o cw . 
at^eals to us and not the actual yields themselves. In point of fact, of cours , 
yields in these experiments were not independent. Each Arc 
spending Goldthorpe, and by considering the 51 differences we tad that the me 
difference between Archer and Goldthorpe has a standard deviataonofS Ste. 

This reduction of the standard deviation of the mean di erenc depends 

by considering the individual differences between corresponding pairs depe 
of course on the fact that corresponding pairs are highly correlated, so that the 

last term in the formula ^ A _ B == a\ +(T%- 2r AB (T A cr B 

is by no means negligible. The art of designing all experiments lies even morem 
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that the Archer was practically identical with a barley which the Danes called 
Prentice, which had beaten all others in their long series of experiments. Both 
Archer and Goldthorpe were, practically speaking, new to Ireland, and they— 
or some improvement* * * § on them—have now almost entirely driven out the other 
inferior barleys from most parts of the country. 

Such, then, is the sort of error which attaches to large experimental plots, that 
is to say a standard deviation of about 10—15 % for a single comparison, and this 
is found to be the order of the error in all ordinary large-scale work—it does not 
vary very closely with the size of the plot, provided that the plot be above say 
one-tenth of an acre, though there may be a slight decrease of error with increase 
of size. 

It follows that although it is quite within the power of any individual farmer 
to carry out a large-scale experiment (and the larger the easier to carry out), it is 
only by co-operation that enough evidence can be obtained to be of any value. 
This co-operation can in practice only be arranged by a government department, 
a large agricultural company, or a farmers’ association, and it is government 
departments that have had most success. 

Small-Scale Work 

We may next discuss small-scale work, leaving to the end a modification intro¬ 
duced by Dr E. S. Beaven, which combines the advantages of the ordinary large 
scale with a considerably smaller error. The considerations which led to this 
modification were derived from experience of small-scale technique. 

Preliminary Considerations. Before coming to any actual comparison of 
varieties on the small scale, attention is directed to some preliminary experiments 
carried out by three different sets of investigators : Stratton and Wood at Oam- 
bridge,f Mercer and Hall at Rothamsted,{ and Montgomery at Nebraska Agri¬ 
cultural Experimental Station.§ 

The first harvested x 9 oth acre of mangolds in j^-acre plots: the second, one 
acre of wheat in -g^-acre plots, and an acre of mangolds in ^-acre plots: the 
third two years in succession harvested the same ^th acre of wheat |n xiro“ acre 
plots, and all weighed the produce of each plot; Montgomery determined the per¬ 
centages of nitrogen as well. All three experiments showed the same thing: that 

* In particular a hybrid of Archer with Spratt made by Capt. Hunter, Spratt-Archer 37/6, 
which proved its superiority to Archer and other varieties in “chessboard” trials similar to 
that detailed below. 

f J- Agric. Sci. m, p. 417, “The interpretation of experimental results”. 

X J. Agric. Sci. iv, p. 107, “The experimental error of field trials”. 

§ Nebr . Agric. Expt. Sta. 25 th Ann. Report , 1910-11, pp. 164-80, “Variation in yield and 
methods of arranging plots to secure comparative results”; and U.S. Dept. Agric. Bur. 
Plant Must. Bui. 269, “Experiments in wheat breeding: experimental error in the nursery 
and variation in nitrogen and yield”. 
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the variation is not random; the yield varies from point to point with an irregular 
regularity; there is consequently correlation between one plot and its neighbours, 
and generally there is a tendency for one end of a field to yield more than the other. 

This is only what is to be expected from a priori considerations; naturally the 
nearer two plots are together the more likely is the soil and its condition to be 
similar on each of them, and the obvious conclusion may be drawn that the smaller 
the plots the more exactly can the yield of adjacent plots be compared. 

Taking the investigation of Mercer and Hall on the 500 “plots” of wheat, it 
should be noted that they were only taken as plots at harvest and before cutting 
formed an unusually uniform area of one acre, part of a much larger field of wheat. 
The mean yield of grain per plot was 3-95 lb. with a range of 2*75-5-14, and a 
standard deviation of 0*46 lb., or 11*6 % of the mean weight of a plot. 

If two adjacent plots were taken as g^-acre plots the s.d. fell to 10 % instead 
of the 8*2 % of random sampling. 

If four adjacent plots were taken as j^-acre plots the s.d. fell to 8*9 % instead 
of the 5*8 % of random sampling. 

If ten adjacent plots were taken as ^-acre plots the s.d. fell to *6*3 % instead 
of the 3*7 % of random sampling. 

If twenty adjacent plots were taken as ^--acre plots the s.d. fell to *5-7 % 
instead of the 2*6 % of random sampling. 

If fifty adjacent plots were taken as ^-acre plots the s.d. fell to *5-1 % instead 
of the 1*6 % of random sampling. 

The high value of the standard deviation of the larger plots compared with 
that which would have been expected had the aggregation been carried out 
randomly is due to a similar cause to that which decreased the error of the com¬ 
parison of Archer and Goldthorpe. There is correlation between the neighbouring 
small plots which make up the larger plots, so that the last term in the formula 

Va+b = +ar% + 2r AB cr A cr B 

is not negligible. This last term is in fact the bridge over a pitfall which has 
trapped many, including—as will be shown later—the present writer. 

In an appendix to Mercer and Hall’s paper I pointed out that advantage may 
be taken of this correlation if we consider the difference between adjacent plots. 

Thus we have 


Size of plot 
(acres) 

s.d. of single 
plot as 

Calculated s.d. 

of difference 
between random 

Actual s.d. of 
difference between 

Total acreage 
required to reduce 
s.d. of a comparison 
tol% 

percentage 

pairs 

adjacent pairs 

1/500 

1/250 

1/125 

1/50 

1/25 

11*6 

10-0 

8-9 

6-3 

5-7 

16-4 

14*1 

12-6 

8-9 

8*1 

11*2 

9*7 

9-3* 

3-7* 

3*9* 

0-50 acre 

0*74 „ 

1*37 „ 

1*10 „ 

3*84 „ 


* The numbers are too few to do much more than indicate the tendency. 
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Except in the case of the ^-acre P lots we actually find that the standard 
deviation of a difference between two plots is less than the standard deviation 
of a single plot, and that working with i^-acre plots, the standard deviation 
of a comparison between the varieties grown on a total area of half an acre is 
as low as 1 %. On the lines of the 2-acre plots more than half a square mile would 
have been required. Further, there is every indication that smaller plots would be 
still more economical of ground. 

These have been termed preliminary experiments, and so they are for the 
purpose of this paper; but in point of fact they followed the practical application 
of the principle which has just been outlined, and a further step in advance had 
already been made. 

Carrying the principle of maximum contiguity, which he had deduced a priori, 
to its extreme logical limit, Beaven had compared two varieties in his cage by 
sowing alternate rows. He used a pure line of Archer barley, and one of a variety 
called “Plumage”, which is allied to the Goldthorpe of the Irish experiments, He 
also grew ^th acre of each outside the cage and found that whereas the Archer 
gave slightly the better yield outside the cage, the cage work gave the yield of 
Plumage some 20 % better than the Archer. 

He sent me the figures to look at, and I found that so far from the, correlation 
between the yields of adjacent drills being positive, it was significantly negative. 

This was quite unexpected at the time (1905), but the explanation was simple, 
viz. that when a plant of one variety is grown next to one of another variety it is 
abnormally situated, and is subject to abnormal competition. 

In this case the Plumage was a taller barley and shaded the Archer; probably, 
also, it started growth more quickly undergound and so annexed more of the soil 
than its competitor. Anyhow, it was clear that a comparison of adjacent rows, 
with the possibility of interference of this kind was useless. 

The Square Yard Plot 

To avoid this difficulty, Beaven invented in 1909 the “ square yard ’’ plot, which 
is formed by sowing eight rows 6 inches apart, 4 feet long, and with seed 2 inches 
apart in the row. This gives in the first place a plot 4 feet by 4 feet; but at harvest 
the outside rows are rejected and the outside 6 inches at each end of all the other 
rows, thus leaving the inside square* yard for the measurement of yield free from 
the competition of other varieties. 

* There has been some controversy in America as to the advisability of testing varieties 
in alternate rows, but lately T. A. Kiesselbach ( J. Amer. Soc . Agron. (1919), No. 6, pp. 235-41, 
“Experimental error in field trials”; pp. 242-7, “Plant competition as a source of error in 
field plots”) has come to much the same conclusion as Beaven, viz. that although certain 
varieties may not under some circumstances interfere with one another, yet it is dangerous 
to allow any chance of the experiment being subject to this source of error, and that the only 
safe thing to do is to surround each experimental area with a border of the variety grown 
upon it, and to discard this border at harvest, , 
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Table I 

Irish Experimental Barley Plots . Yield and Money Value Per Acre of 
Archer and Goldthorpe 1001-6 



1901: 

McCarthy 

Hawkins 

Dwan 

Wolfe 

1902: 

McCarthy 

Hawkins 

Wolfe 

Willington 

Gorman 

Nunn 

1903: 

McCarthy 

Hawkins 

Wolfe 

Willington 

Gorman 

Nunn 

Quinn 

Kearney 

1904: 

McCarthy 

Hawkins 

Wolfe 

Willington 

Kelly 

Allardyce 

Roche 

Nunn 

Kearney 

Segrave 

1905: 

McCarthy 

Hawkins 

Wolfe 

Willington 

Luttrell 

Kelly 

Matthews 

Nunn 

Dooley 

Kearney 

Segrave 

1906: 

McCarthy 

Hawkins 

Wolfe 

Willington 

Luttrell 

Mulhall 

Matthews 

Tennant 

Nunn 

Dooley 

Kearney 

Segrave 


Ballinacurra 

Whitegate 

Thurles 

Nenagh 

Ballinacurra 

Whitegate 

Nenagh 

Birr 

Enniscorthy 

Castlebridge 

Ballinacurra 

Whitegate 

Nenagh 

Birr 

Arnestown 

Castlebridge 

Carlingford 

Greenore 

Ballinacurra 

Whitegate 

Nenagh 

Birr 

Portarlington 

Monasterevan 

New Ross 

Castlebridge 

Carlingford 

Dunleer 

Ballinacurra 

Whitegate 

Nenagh 

Birr 

Monasterevan 

Portarlington 

Tullamore 

Castlebridge 

New Ross 

Carlingford 

Dunleer 

Ballinacurra 

Whitegate 

Nenagh 

Birr 

Monasterevan 

Tullamore 

Bagnalstown 

Castlebridge 

New Ross 

Carlingford 

Dunleer 


Central Plain 


Wexford 


Central Plain 


Wexford 


Central Plain 


Wexford 


Goldthorpe 


V. P. A. 



Central Plain 


Central Plain 
Wexford 


Central Plain 
Wexford 


Barrels Stones £ a. d.\ Barrels Stones 


11 

4 

9 

0 

0 

7 

0 

5 

2 

10 

3 

8 

3 

0 

7 

12 
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3 

15 

2 

11 

13 

0 

13 

14 

11 

0 

11 

0 

8 

13 

0 

10 

0 

8 

3 

12 

6 

8 

13 

0 

11 

14 

8 

11 

14 

0 

10 

12 

0 

13 

0 

10 

3 

12 

2 

9 

4 

0 

13 

6 

10 

2 

12 

6 

9 

16 

0 

9 

3 

7 

6 

11 

5 

9 

2 

0 

11 

14 

9 

2 

11 

3 

8 

18 

0 

11 

4 

9 

0 

6 

10 

4 

13 

0 

7 

4 

5 

5 

8 

12 

7 

1 

0 

7 

5 

5 

19 


5 9 0 

7 9 0 

4 10 0 
9 16 0 

8 4 0 
8 12 0 

7 15 0 

8 7 0 
10 9 0 

8 17 0 

9 12 0 
8 1 0 

5 16 0 
7 8 0 

6 7 0 
9 9 0 


6 11 0 
6 6 0 
5 15 0 
7 16 0 

7 0 0 
5 19 0 

9 8 0 

8 4 0 

9 7 0 
9 4 0 
9 0 0 

8 7 0 
5 6 0 
4 19 0 
7 9 0 

9 7 0 


9 

8 

0 

13 

1 

9 

16 

0 

9 

8 

0 

11 

5 

8 

14 

0 

10 

14 

0 

15 

10 

10 

3 

0 

11 

14 

0 

13 

8 

10 

11 

0 

11 

0 

0 

12 

13 

9 

14 

0 

8 

19 

0 

10 

8 

7 

17 

0 

10 

18 

0 

10 

10 

8 

1 

0 

7 

15 

0 

11 

6 

8 

0 

0 

10 

0 

0 

13 

10 

10 

12 

0 

9 

12 

0 

11 

4 

7 

12 

0 

11 

11 

0 

12 

8 

9 

19 

0 

7 

6 

0 

9 

11 

7 

2 

0 

7 

12 

0 

8 

14 

6 

9 

0 

8 

14 

0 

8 

13 

6 

11 

0 

8 

0 

0 

9 

15 

7 

6 

0 

7 

3 

0 

10 

9 

7 

17 

0 

9 

6 

0 

13 

14 

10 

7 

0 

6 

16 

0 

8 

11 

6 

11 

0 

11 

7 

0 

13 

14 

10 

9 

0 

8 

19 

0 

10 

9 

7 

18 

0 

10 

7 

0 

12 

5 

9 

0 

0 

8 

9 

0 

12 

12 

9 

8 

0 

10 

8 

0 

13 

6 

9 

16 

0 


Note. The Irish barrel of barley contains 16 stones. 
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So for as I am aware, no one has made any further inquiry as to the most 
economical size of plot; the square-yard plot only utilizes for yield determination 
ie o e experimental area, and to make it smaller would waste still more 
ground, while the larger the plot the more we depart from the principle of 
maximum contiguity. 1 


There are probably not enough data to discover by the calculus the size of plot 

7 1C JT r S r ^ mulimum P^able error per acre, and no one seems to have 
faced the labour of an experimental determination. At all events, without any 
further investigation the square yard plot has been adopted as the unit in some 
six or seven experimental cages in the British Isles. 


Comparison on a “Chessboard” 

Having adopted the unit, it was a comparatively simple matter to set units of 
two varieties in a “chess” or “chequer” board: subsequently it was found that 
more than two varieties could be economically compared at the same time. 

o illustrate the problems which arise when we come to compare several 
varieties grown together on a “chessboard”, we may take Beaven’s No 1 Yield 
Experiment of 1913.* 

In this, 20 plots of each of eight races of barley were grown on a regular system 
of repetition, and the following observations were made for each plot: 

Number of plants, 

Number of ears, 

Weight of ears, 

Weight of straw. 

For the purpose of this illustration we need only consider yield of corn i e 
weight of ears. ’ 

The eight races consisted of 


Four strains of Archer 


English Archer 1 

7 4 J 

Irish or Early Archer | 
Irish Archer, No. 5 I 


Plumage 


Selection made by Beaven. 

Selection made by Capt. H. Hunter, B.Sc., of the 
Irish Department of Agriculture. 

A selection made by Beaven which' originated in 
Denmark. Wide-eared barley somewhat like 
Goldthorpe. 


Each of these was, of course, descended from a single seed a few generations 
back, and 


Three hybrids 


145 and 145/46 j Fr0m a Plumage-Archer cross made by Beaven, the second being 
' ( a re-selection from the first. 6 

“Biffen” (Selected by the Professor of that name from a Plumage-Archer 

l cross of his own. s 


In order to simplify the comparison of errors it is best to work as long as 
possible, not with the standard error but the “variance”, or square of the standard 
error. It has two advantages: (i) that variance can be added or subtracted without 


* Vide Diagram I, p. 97. 
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th e preKminary squaring and subsequent extraction of the square root, and (ii) 
at the area required to give any required accuracy varies directly with it; in 
order to give the same error a comparison with a variance of 60 only requires 
halt as much ground as a comparison with a variance of 120. 

Further, the variance taken in each case win be the variance of the average of 
, P , f or differences between plots, or whatever it may be, and to get this we 
lvide by 19, and not by 20, to correct for the small number. 

The following table gives the means and variances of the average Of 20 plots 
for the eight races as follows: 


Table II 



Mean weight per 
plot, grammes 

Variance of the 
average of 20 plots 

145/46 

318-7 

94-7 

Early Archer 

306-5 

138-9 

7 A 

304-6 

80-7 

145 

300-7 

94-9 

English Archer 

297-8 

128-8 

Plumage 

295-2 

150-8 

Irish Archer, No. 5 

276-5 

81-7 

Biffen 

270-8 

142-0 

Average 

296-4 

114-1 


Correction for Position* 

There is a great disadvantage in correcting any figures for position, inasmuch 
as it savours of cooking, and besides the corrected figures do not represent any- 
thmg real. It is better to arrange in the first place so that no correction is needed. 

n the present case the “vertical” arrangement is satisfactory, but as to right 
and left it is not so. English Archer averages 0-2 rows to the left of 145 0-4 to the 
left of 145/46 and so on, 1 -4 rows to the left of Biffen. As the average value per plot 
of a row is about 3-3 grammes higher than that of the row on its left, it might be 
thought right to make the following corrections: 


145/46 

318-7 +1-0 = 319-7 

Early A 

306-5 -0-3 = 306-2 

7 A . 

304-6-1-7=302-9 

145 ... 

300-7 + 1-7=302-4 

English A ... 

297-8 + 2-3 =300-1 

Plumage 

295-2+0-3 =295-5 

Irish A, No. 5 

276-5-1-0 = 275-5 

Biffen 

270-8-2-3 =268-5 


. ‘ 5° r a " elabo | r l ate method °f Correction for Inequality of Soil, see Pearl, “A method of 
-orrectmg for soil heterogeneity in variety tests”,./. Agric. Bes. v, p. 1039 

Jn this paperDrPearl has corrected yieldon the analogyofacontingencytaHe.Themethod 

vhich is probably as good a way as any of correcting for position, seems to me to be open to 
erious objections A blot on the paper is the publishing of a “probable error” calculated 
om cases without either correcting for the very small number or calling attention to 
he fact that they are appreciably too low. 


7-2 
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The error of a comparison would no doubt be reduced very slightly as it generally 
is by any operation of this kind. 

In any case the order is not altered, and I do not think the correction is worth 
making; the proper course would have been to reverse the order of the plots half 
way through so as to compensate for a possible tendency to improve from one 
end of the experimental area to the other. 


Variance in Table II 

With the small numbers in question the variance figures do not differ signifi¬ 
cantly, but incidentally there is no indication that the hybrids are more variable 
in yield than the pure lines. 

In order to get a clear idea of what these figures mean, let us suppose that a 
standard error of 1 % is desired, say 3 grammes, a variance of 9. That would 
require an area 114-1/9, or 12-7 times as large as the present 20 plots. 

If now the plots had been randomly placed, the variance of a comparison 
between two of the races would have been approximately 228, and about 25 times 
as much ground as was used would have been required to reduce the standard 
error of a comparison to 1 %. 

In order to give a general idea of the nature of the variability, chiefly due to 
soil, which has to be regarded as error when we consider the yield of varieties, 
Diagram II has been prepared in which each 20 grammes of yield above 100 
grammes below the average yield of the variety is represented by a diagonal line 
drawn across the square representing the plot. It will be noticed that the shading 
grows heavier towards the right of the diagram, and that while it is by no means 
regular, the correlation between the shading of neighbouring plots is obvious to 
the eye. 

The arrangement of the different races in a chessboard is of course designed to 
take advantage of this correlation by comparing always neighbouring plots as in 
the following example which concerns the first pair of races in the table. 

Beginning at the left hand of Diagram I, 145/46 is in the middle of the first 
vertical line, and Early Archer at the top—the former being indicated by the letter 
C, and the latter by E. The yield of the first is 265-6, and of the second, 230-1. That 
gives a positive difference of 35-5. The next appearance is in the third line, again 
a positive difference, this time of 44-4. In the third occurrence the 145/46 is in 
the fourth line, and the Early Archer in the fifth line, and the difference this time 
is negative and 37-4, and so on. 

The variance of the average of the 20 differences thus obtained is 124-0, very 
much less than the 233-6, which is the sum of the variances of the averages of the 
two races. 

Now, if there were only two races in the chessboard it would be comparatively 
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straightforward—the standard deviation would be found from the variance, and 
Sheppard’s tables (or preferably with such small numbers, “ Student’s ”) would be 
used to judge the significance of the mean difference. In point of fact, however, 
the two races do not stand alone, and the question arises whether it would not be 
better to take the average variance of all the 28 differences between all the possible 
pairs of eight races. 

Of course, it is not likely that all our races would have the same variance, but 
with our small numbers such differences as there may be are almost certainly 
swamped by the error of random sampling, which, as pointed out above, will 
account for the observed values. From that point of view then it is better to 
average. 

Again, all the comparisons are not of equal value: Irish Archer No. 5 is always 
found exactly on the right of English Archer, while Plumage is either three 
squares above English Archer or two below and one row to the right, and as will 
be shown later, there are indications that this is enough to affect the variance. 
Still it is not a very big thing, and the advantages of using a single figure far out¬ 
weigh the slight loss of accuracy. I have calculated the 28 variances and they 
range from 44-1 (English Archer-Irish Archer No. 5) to 192*9 (Early Archer- 
Plumage), with a mean of 107*9. This is slightly lower than the 114*1, the average 
variance of the races. In other words, we have gained by chessboarding to the 
extent that we are as accurate as if we had devoted twice the area to plots ran¬ 
domly arranged. 

The calculation of these 28 variances is tedious, but fortunately there is a short 
cut which gives an identical result. 

In the following proof capital subscripts indicate variance directly measurable, 
which is taken as the mean value of such variance, while small subscripts indicate 
variance deducible from the observations. 

If we suppose the total variance erf of mn plots (i.e. n groups of one of each of 
m races subject to the error of random sampling) to be divided into three parts: 

(i) that due to the m races if measured without error: erf; 

(ii) that due to the position of the n groups of m races from left to right of the 
diagram (in this case 20 groups of eight) also measured without error: cr^; 

(iii) the casual error, which is the only part subject to random sampling: aj; 
these three parts may be assumed to be independent so that 

erf = + + 


also the variance of the means of the races as we measure them i 


is 


or = + 


n mn' 


the last term being due to the fact that we have only mn cases to give us fche 
mean. 
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Similarly, the variance of the means of the groups as we measure them is 


(r z + — _— 

0 m mn 


and the total variance as we measure it is 


°T = 


from which eliminating erf, erf, erf., we find 


mn(er% — er%— o%) 


and consequently 


which is the variance of a comparison between n groups of two races, is 

2m(o% —o%—(Tq) # 

(m—l)(n—l) 


* In my first attempt to obtain this formula, I overlooked the — erfjmn in the three equa¬ 
tions for cr%, o% and o%. It was only after receiving a letter from Mr R. A. Fisher, who had 
independently arrived at the correct formula, that I found my mistake. Mr Fisher sent me 
two proofs, one of which was purely algebraical, proving in his notation the identity 




X^f-n{X v -X v .)‘ 


SS(X-X)*-mS(X q -X)*-nS(X v -X ) 2 

_ J_J_l_i__. 

(m— 1) (n— 1) 

and the other, which he himself prefers, I append below: 

“Let there be n trials indicated by suffices 1 q ..., n of each of m varieties similarly 
indicated by suffices 1 ...,p m. 

Recognizing that not only differences of variety but differences in the conditions of the 
trials may have affected the yields, we may obtain an estimate of what the variability would 
be if the conditions of any one trial could be replicated in a number of experiments with the 
same variety, provided the following simple assumptions hold good. The yield obtained in 
any experiment is the sum of three quantities, one depending only on the variety; a second, 
depending only on the 4 trial ’; and a third, which may be regarded as the 4 experimental 
error ’ varying independently of variety and trial in a normal distribution about zero with 
a standard deviation which it is desired to estimate. 

To obtain such an estimate we may fit the system of yields X m with a system of values 
A v + B q , choosing the latter so that 

SiX^-Av-B,)* .(1) 

is a minimum. Any one of the m + n quantities A 1} , B q may be assigned an arbitrary value, 
and the remaining m + n — 1 are then determinate: the observed values may therefore differ 
from those fitted in (m— 1) (n— 1) degrees of freedom, and the corresponding estimate of 
the standard deviation ascribable to experimental error will be found by dividing the 
minimum value of (1) by (m— 1) (n— 1). Evidently (1) will be a minimum if 

A v -{-B a = X i) -\-X a — X, 
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To obtain the variance by this formula is a comparatively simple operation. In 
this case owing to the fact that I grouped the 160 observations in 10-gramme 
groups I got 109-3 by the short cut instead of 107-9, but it really should give an 
identical value. 

Taking the square root we get a standard deviation of 10-4 grammes or there¬ 
about for the standard error of a comparison, i.e. a probable error of about 2-4 %. 
This is probably as near as it is worth while going in any one season, for the experi¬ 
ment must be repeated several times to sample the weather properly, and cage area 
is too valuable to expend more than is absolutely necessary on a single experiment. 

Before leaving this subject of chessboards, I would like to show in rather more 
detail that even with such small plots as these, slight differences in the arrange¬ 
ment within the group tend to increase the variance over that due to the ideal 
juxtaposition. 

I have, therefore (see Diagram III, p. 104), separated the various kinds of 
comparisons and averaged the variance, in each case as that of the average of 
20 differences. 

The figures are not of course worth a great deal, but there is a marked tendency 
for the comparisons between the more distant plots to be the less accurate. 

For purposes of illustration, I have correlated the distances with the variance 
for the 13 positions by the Spearman method, and get p = + 0-41. 

where X^is the mean of the values obtained with variety p,X Q the mean of the values obtained 
with trial q, and X is the general mean. 

The actual evaluation is most conveniently carried out in the following form of the analysis 
of variance: 

Variance Degrees of freedom Sum of squares 

m __ 

(a) Due to variety m—1 nS{X v — X ) 2 

n _ _ 

( b ) Due to trial n~ 1 mS(X Q — X) 2 

rn n „ , „ 

(c) Random variation (m — 1) (n — 1) SS(X va — X P — X q + X) 2 

ll______ 

- - ' : : : : m n " 

(d) Total mn—1 SS(X-X) 2 

11 

The sum of squares in line (c) being calculated by subtracting the values of lines (a) and 
(b) from the total. If either variety or ‘trial’ were without significant effect on the yield, the 
corresponding mean square would not differ significantly from that of line (c). To test the 
significance of such a difference we may use the fact that the estimates of variance in (a), 
(6) and (c) are all independent, and when m and n are fairly large the natural logarithm of 
the mean square has standard deviation ^(2/n^, where n x is the number of degrees of freedom. 
In comparing two such independent estimates of the mean square, we therefore obtain the 
difference of their natural logarithms, and assign to it a standard deviation 
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Diagram III 



The Hale-Drill Strip Method* 

The small-scale work with which I have just dealt affords a means of picking 
out good varieties which can be tested in field trials. The whole eight varieties 
were tested on about acre, sowing about a quarter of a pound of seed for each 
race. We now proceed to the most accurate method yet devised for field trials by 
which two varieties are compared on a total area of 5200 square yards, just over 
an acre, with, in the case which I shall give you, a standard error of 0*63 %. Of 
course, it will not necessarily be as low as this always. 

The field is cultivated as usual up to the time of sowing, except that particular 
care is taken to clean the ground of weeds. 

# For a full account vide “Trials of new varieties of cereals ”, by E. S. Beaven, J. Minist . 
Agric. xxix, Nos. 4 and 5 (1922). 
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When sowing, the seed box of the drill is divided into two across the middle, 
and the middle coulter put out of action. The seed of the two varieties is put in 
the seed box, one on each side of the division. Thus when sowing a drill strip, one 
half (i.e. 6 or 7 rows) is sown with one variety and the other half with the other. 
On turning the drill at the end, the next strip is sown so that two half strips of 
the same variety are next each other, but care is taken to leave an interval 
between the two drill strips exactly equal to the gap in the middle of each drill 
strip between the two varieties. It requires careful steering but it can be done. 

When the experimental field is sown, we get first a single half-drill strip of one 
variety, then two of the other, then two of the first and so forth, ending with a 
half-drill strip of the first. This ending is necessary in order to discount any 
fertility slope from one end to the other of the field. The space outside the experi¬ 
mental area should be sown all round with a similar grain, as the outside is 
naturally abnormal and is more liable to attacks from all kinds of enemies. 

At harvest the outside row of each half-drill strip next to the other variety is 
pulled up by hand and discarded to eliminate the “border” effect, and also to 
facilitate the use of the ordinary reaping machine. If the two varieties do not ripen 
together one must be cut by hand when ripe, but if there is so little, difference 
that both can be cut on the same day the reaping machine can be used on both. 
In either case each half-drill strip is cut in such a way that the produce of each 
3 ^" acre can U P in two sheaves separately. In Beaven’s case ten such 

5 ^-acre plots went to each half-drill strip. 

These sheaves can be weighed on the field, and so we can get the total produce 
of the field, in plots of g^-acre and can compare each g^-acre with an adjoining 
one of the other variety. 

Two things are to be noted at this point: (1) That without a very great deal of 
trouble the plots cannot be threshed out separately, but, fortunately, it has so 
far always been found where the matter has been put to the test that the variability 
of the yield of grain expressed as a percentage of the grain is less than the varia¬ 
bility of the total yield expressed as percentage of total yield. In the Mercer and 
Hall experiment, the standard errors were 1.1*6 and 11*9 %, and Beaven’s ex¬ 
perience has been similar. Thus the figure which we obtain for the standard error 
is likely to be in excess of the truth. (2) From a practical point of view it is easier 
to work with a few half-drill strips than a larger number of short ones, but if we 
depend on the w eights of a few drill strips, there is considerable uncertainty about 
the standard error of the result. It was hoped that by determining the standard 
error of the difference between adjacent g^-acre plots, we could deduce the 
standard error of the average of n such differences by the formula <r a = <r/*Jn, so 
that it would be immaterial whether the drill strips were long and few or short and 
many, as long as altogether there were n pairs of adjacent subplots. Indeed up to 
the time when I came to write this section, it was believed that this could be done. 
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Beaven showed me his figures before publication, and I did not at the time observe 
that the formula cannot be used without further investigation, nor, so far as I 
am aware, has anyone else drawn attention to it. Nevertheless, I think it will be 
clear from the general considerations which have been advanced throughout the 
paper that there is a danger that the differences between corresponding con¬ 
stituent plots of a drill strip, even when they are as narrow as these, will tend to be 
correlated, and the formula cr a = crjfn, which required independence of the in¬ 
dividuals which are to be averaged, cannot be used without correction.* That this 
is so in the particular case which we are considering is made highly probable 
from the fact that the variance, expressed in terms of the percentage of the total 
weight of C , of the difference between the total produce from A and C is 0*664 
of the total weight of C when calculated from the 27 differences between adjacent 
half-drill strips, while it is only 0*301 when calculated from the 270 differences 
between adjacent subplots. The two figures should be the same within the error 
of random sampling, but differ probably by more than twice their standard 
deviation. 


The results of the 1921 Trial are shown in Tables III and IV, which are taken, 
with his kind permission and that of the Ministry of Agriculture, from the Sup¬ 
plement to Beaven’s paper, and give the weights of the sheaves, on the individual 
half-drill strips, and on 243 of the 270 “plots ”, which go to make up the half-drill 
strips respectively. 

It will be seen that by taking the differences between adjoining half-drill strips 
(or plots) a large part of the error is, as usual, eliminated. 


* A fallacy arising from a similar neglect of correlation has come under my notice in some 
American work, but there the absurdity is more easily demonstrated. In the J. Amer. Soc. 
Agron. ix, p. 138, A. G. McCall proposed that in order to save the trouble of harvesting and 
weighing ygth acre plots a number of square yards should be cut out and harvested separ¬ 
ately, the square yards being taken systematically through the yg-th acre plot, and the yield 
per acre calculated from these square yards. So far, so good, by taking enough square yards 
the slight loss of accuracy may perhaps be made up by gain in time or feasibility of operating. 
But in 1919, Arny and Steinmetz, J. Amer. Soc. Agron. xi, pp. 88, 89, applying this method, 
compared the error of the yield calculated from a few square yards cut from each of a number 
°f Toth acre plots with that calculated from the y-Q-th acre plots themselves. They found it 
substantially greater, but, say they, by increasing the number of square yards cut from each 
iVth acre plot to n, we can decrease the error in the proportion 1 ffn, and so we can actually 
determine the yield more accurately by weighing up 10 or 20 square yards than by weighing 
up the whole half acre. It is rather surprising that they did not realize that there are 484 
square yards in -jyth acre, so that by taking 484 square yards they would be likely to be more 
accurate than if they took any lesser number and a fortiori tremendously more accurate 
than they would be if they took the same 484 square yards and called it ygth acre! Of course 


their formula also should be o' 


l + (w— l)r 
n 


, where r is the correlation between the yields 


on the square yards composing -J^th acre plots, and not crj<Jn. 

The same fallacy has been used to extol the “rod row” method of determining yield, i.e. 
the method of cutting along the drill a row one rod in length to represent the yield of the plot 
from which it is cut. 
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Table III. Warminster Field Variety Trial, 1921. Half-Drill Strip Weights, 
comparing: Two races of barley, viz. “C” and “A”. Area of each half-drill strip 
= 100 sq . yd. Total area— 1700 sq. yd. = 0-56 acre for each race. Showing total 
weight of sheaves on each half-drill strip 


Half-drill 

strip 

Weight of 
sheaves on half¬ 
drill strip 

Difference 
between 
“A” and “(7” 

Half-drill 

strip 

Weight of 
sheaves on half¬ 
drill strip 

... . 

Difference 
: between 
“A” and “C” 

Number 

lb. 

lb. 

Number 

lb. 

lb. 

“<7” 

“A” 

“<7” 

“A” 

“A ”-“<7” 

“C ” 

“A” 

“(7” 

“A” 

“A ”-“(7” 

1 

2 

165-4 

164-6 

- 0-8 

29 

30 

160-9 

160-2 

. - 0*7 

4 

3 

159-5 

173-4 

H-13*9 

32 

31 

153-2 

164-3 

-hill 

5 

6 

169-3 

169-3 

— 

33 

34 

144-9 

154-3 

+ 9-4 

8 

7 

179-8 

174-9 

- 4-9 

36 

35 

147-7 

158-6 

+ 10-9 

9 

10 

172-5 

177-6 

+ 5*1 

37 

38 

142-4 

143-0 

+ 0-6 

12 

11 

170-7 

182-9 

+ 12-2 

40 

39 

138-7 

143-6 

+ 4-9 

13 

14 

173-3 

167-5 

- 5-8 

41 

42 

131*1 

143-2 

+12-1 

16 

15 

166-1 

178-5 

+ 12*4 

44 

43 

141-6 

145-3 

+ 3*7 

17 

18 

174-5 

170-3 

- 4-2 

45 

46 

145-0 

150-1 

+ 5-1 

20 

19 

163-3 

176-0 

+ 12-7 

48 

47 

155-4 

154*0 

- 1-4 

21 

22 

166*0 

159-1 

- 6-9 

49 

50 

151-1 

149*3 

- 1-8 

24 

23 

161-2 

168-7 

+ 7-5 

52 

51 

145-6 

149-7 

+ 4-1 

25 

26 

169-3 

164-2 

- 5-1 

53 

54 

146-3 

158-5 

: +12*2 

28 

27 

156-5 

167-0 

+ 10-5 


Total 

4251-3 

4368*1 

__ 

_1. 




Average f 
per cent. \ 

157-5 

100 

161-8 

102-7 

+ 4*3 

ii + 2-7 | 
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* These figures represent weights of the first and last sheaves on each half-drill strip added together, and are excluded in calculating the average 
weights and also in calculating the ‘‘probable error”. 
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Further, it is obvious that there is a general decrease in fertility as we go from 
drdl strips with low numbers to drill strips with high numbers. It follows that the 
difference A-C will tend to be greater when C follows A than when A follows C 
an since this is always possible, experiments of this nature should always be 
planned so that there shall be an even number of differences, the series should 
begin and end with half-drill strips of the same variety: in this case we may 
simply leave out the last drill strip and finish at half-drill strip 52 

There is also a curious feature about these figures which can only be put down 
to some systematic error in technique; namely that when we compare together 
the adjacent half-drill strips of A, that with the higher number always yields 
igher although the general fertility runs the other way, and the same is true with 
regard to C m eight cases out of thirteen. 

t.H^l’ru 1686 “ e T r (that dUe t0 the S eneral fertility slope and that due 
to the different fertility of odd and even half-drill strips) are largely eliminated by 

Beaven s arrangement by which in alternate comparisons A follows C and C 
ofiows A and this can be made evident by adopting as unit not the difference 
between adjacent haif-driU strips but that between the sum of the two contiguous 

them^’ 1 PS ° f A ^ th<! SUm ° f th6 tW ° half - dri11 stri P» of 0 which enclose 

This may be described as a “sandwich ”, and it may be noted that just as there 
are subplots composing a half-drill strip, so there are “subsandwiches ” which wifi 
also tend to eliminate the same errors as the “sandwiches ”, 

The following table gives the differences A - C for the thirteen “sandwiches” 
composed of half-drill strips 1 to 52: 

Table V 


Half-drill strip 

numbers 

1 to 

4 

5 „ 

8 

9 „ 

12 

13 „ 

16 

17 „ 

20 

21 „ 

24 

25 „ 

28 


A -C 

Half-drill strip 
numbers 

A-C 

+131 

29 to 32 

+ 10-4 

- 4-9 

33 „ 36 

+ 20-3 

+17*3 

37 „ 40 

4 5-5 

+ 6-6 

41 „ 44 

+ 15-8 

+ 8-5 

45 „ 48 

+ 3-7 

+ 0-6 

49 „ 52 

+ 2-3 

+ 5-4 


--- , 


The mean A-C for sandwiches is + 8-05 and the variance, making allowance 
or the pitifully small number, is 51-41. This leads to a variance of the difference 
mtweenthe total produce of A and of C expressed in terms of the total weight 
) of 0-398, intermediate between the 0-664 calculated from the half-drill strip 
hfferences and the 0-301 calculated from the subplot differences. 

It should be noted at this point that the “sandwich” is a perfectly legitimate 
levice for eliminating errors common to both variants whose difference is to be 
neasured, and that it is only by using it that we can get the true value of the error 


. -- t - --jj 
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of the comparison, whereas the subplot difference would really lead to a larger 
value than 0-301 if we had sufficient knowledge to be able to apply the true 

formula cr 2 ( 1 + (^ - 1) r) 

n 

A similar calculation based on the “subsandwiches”, i.e. sandwiches one plot 
in depth, gives a value of the variance 0-248 corresponding to the 0-398 from the 
whole sandwiches. The standard deviation of these to some extent correlated 
figures is not easy to determine, but the difference between them must be of the 
order of once the standard deviation. This is not significant, but with our small 
numbers it is not inconsistent with the expected correlation between the sub- 
sandwiches” composing a sandwich. Until a number of experiments have been 
carried out in several places and the results submitted to analysis, it would be wise 
to keep the number of drill strips as large as possible and economize in length m 

spite of the practical difficulties of doing so. 

Since the variance calculated from the drill-strip sandwiches is subject to a 
large error of random sampling owing to the necessary paucity of numbers, it is 
well to calculate also from the “subsandwiches” and take the larger of the two 
in determining the standard error. 

It is possible that some of my readers may devise some better method of 
utilizing the weights of the “subplots” than I have been able to do, and I com¬ 
mend the problem to them. 

In the present case it is probably better with only thirteen sandwiches to take 
the standard error of a single sandwich and use “Student’s” tables, when the 
probability that such a large positive difference should occur by chance is found 
to be 0-001. The difference is therefore quite significant. If, however, it is required 
to compare the standard error with other experiments, we can say that the most 
probable value is only 0-63 % on a total area of about 1 acre. 

Other precautions, such as correction for moisture, etc., are taken as a matter 

of course. 

Conclusions 

The chief difficulty of comparing varieties consists in the fact that the differ¬ 
ences to be measured are quite small compared with the variations due to soil 
and weather. While the latter is not within our control, the errors due to the soil 
may be reduced to reasonable proportions in any one of three ways: 

(1) Large plots may be repeated many times. An instance is given of this when 
in the Irish 2-acre experimental plots a difference of 7 % in the value per acre 
was proved with a standard deviation of about 2 % in 51 trials, extending ovei 

six years. 

Undertakings of this magnitude are hardly to be put in hand by any but 
Departments of State. 
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(2) Quite small plots of one square yard, surrounded by a border of the same 
variety as in the square yard, may be grown under a wire cage on a regular system, 
technically called a “ chessboard”. An instance of this is given when, in Beaven’s 
No. 1 Yield Experiment of 1913, eight varieties were compared on a total area of 
about xVth acre using about 5 oz. of seed of each variety, with a standard deviation 
of a comparison in a single year of about 3J%. 

The large number of varieties which may be compared at once, and the small 
area which is required, make this an ideal method of testing new varieties. On 
the other hand, a wire cage is not a cornfield, and the varieties found to be best 
in the cage will always require further testing on the large scale. The method is, 
however, within the powers of anyone who can build a cage, and has the necessary 
skill and patience to conduct the experiments. 

(3) By means of Beaven’s “half-drill strip” method, two varieties may be 
compared on a total area of about one acre in one year with a standard deviation 
of a comparison of less than 1%. This combines the advantage of growing corn 
on the large scale with an accuracy almost as great as that of small-scale work; 
and is within the powers of anyone who can combine the necessary knowledge 
and patience with the control of skilled agricultural labour. 

It is shown that methods (2) and (3) depend for their accuracy on the fact that 
the nearer two plots of ground are situated, the more highly are the yields corre¬ 
lated, so that we are able to increase the effect of the last term of the equation 

°a-b = + <r% ~ 2 r AB cr A (T B 

(where A and B are the varieties to be compared) by placing the plots to be 
compared with one another as near together as possible. 

A formula, due to Mr R. A. Fisher, is given for calculating the error of a 
comparison in a “chessboard” experiment, which may perhaps be found useful 
elsewhere. 

Finally I have to thank Dr Beaven both for allowing me to use his experimental 
material and for much invaluable assistance in the preparation of the paper. 

Addendum 

Since writing the above I have had the advantage of witnessing the harvesting 
of Dr Beaven’s 1923 experiment and of discussing the whole question with him 
very thoroughly. 

He thinks it probable that the whole or a part of the correlation between the 
yields of the “plots” which together formed a drill strip in the 1921 experiment 
may have been due to slight differences in area consequent on irregular steering of 
the seed drill, such as would have been caused by the horses pulling unequally. 

Measurements which we made on the stubble of the similar 1923 experiment 
showed not only that such inaccuracies occur, but also that they can favour one 
of the varieties. 


BPS 
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It is, however, a fairly easy matter after harvest to measure the total width 
from the outside drill of one half-drill strip to the outside drill of the same variety. 
This measurement includes the space between the drill strips, which is variable 
owing to the difficulty of steering and is now made in practice across each drill 
strip in several places. 

It is thus possible to estimate accurately the total area occupied by each 
variety and to make the necessary correction to the total yields. 

As, however, it would hardly be possible to correct the individual drill strips 
or “ plots ” which are used for the purpose of calculating the error, that calculated 
error will be in excess of the truth. 

In Dr Beaven’s opinion the operation of taking differences has for all practical 
purposes eliminated the correlation due to the position of the plots , and in 
view of the other causes of variation in the differences, numerous and diverse 
as they are, he still considers it legitimate to treat the differences between the 
“plots” as if they were random, and to use the formula cr/^jn in calculating the 
error of his mean difference. I feel, however, that a single operation of this nature 
is hardly likely to eliminate all the correlation and that there is need for further 
inquiry: if as the result of a number of experiments it is found that the error of 
the mean difference calculated from the weights of the half-drill strips is not 
significantly greater than that calculated from the “plots”, then the latter 
undoubtedly provide the more accurate data for the calculation of that error, 
and it will be a matter of indifference whether the drill strips be few and long or 
short and many. 

Meanwhile they should be made as numerous as is consistent with the suc¬ 
cessful carrying out of the various agricultural operations, which are of course 
made infinitely more difficult and tedious by the necessity of turning horses and 
machines at the end of each short length. 

But whether we use few long or many short strips is not a question of the first 
importance: in either case the method is without doubt the best that has hitherto 
been devised for large-scale experiments. 

Later Note 

The following note relating to a paragraph on pp. 98-9 above was included 
by Student in the next volume of Biometrika (xvi (1924), p. 411): 

I wish to apologize to the readers of Biometrika for having allowed it to appear 
that I was the author of the term “Variance” defined as the square of the 
Standard Deviation. It was first used by Mr R. A. Fisher in 1918 in a paper en¬ 
titled “The Correlation between relatives on the Supposition of the Mendelian 
Inheritance”, Trans . Roy. Soc. Edin . lii, 2, pp. 399-433; and he has published 
many papers since in which the word has been used. 
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NEW TABLES FOR TESTING THE SIGNIFICANCE 
OF OBSERVATIONS 

[. Metron , V (1925), p. 105] 


In Biometrika , vi, pp. 1-25 [2] it was suggested if z = x/s t where x is the distance 
of the mean of a sample of n from the true mean of a normally distributed 
population, and s is the standard deviation of the same sample, i.e. 


n 


)■ 


then the frequency of z is given by the frequency curve 

Win) 


and that consequently the integral 


(1 +2 2 )-i», 


P = 



ran) 

m»- i)l 



+ z*)-^ n dz 


gives the probability that the mean of a sample of n drawn from a normally 
distributed population, measured in terms of the standard deviation of the 
sample, shall exceed the value 2 . 

Tables were constructed for values of n from 4 to 10 [2, p. 29], and subse¬ 
quently, in Biometrika , xi, p. 416 [ 8 , pp. 62-3], from 2 to 30. 

It has since been shown, as in the preceding paper by Mr Fisher (Metron, v 
(1925), pp. 90-104), that the suggestion was in fact justified, and that the integral 
has a much wider application than was originally supposed. 

The tables hitherto published suffer however from two defects: (i) that as 
n increases the 2 scale becomes very coarse, and (ii) that except in the case for 
which it was designed, n, the number in the sample, is not the best number 
under which to enter the table, but n- 1, the number of degrees of freedom. 

The present tables have, therefore, at Mr Fisher’s suggestion been constructed 
with argument t —z*Jn, where n is now one less than the number in the sample, 
which we may call n\ They correspond to Sheppard’s table, when that is used 
to test the significance of the mean of a large number of observations. 

Table I extends from t = 0 to t = 6, at intervals of 0*1, from n = 1 to n = 20, 
inclusive; in each column in which values of more than 0-99995 occur, the first 
of these is written 1-0000, and further values are not given. 


8-3 
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Table II gives values beyond t — 6, to six places of decimals, from which values 
accurate to four places of decimals can be calculated by proportional inter¬ 
polation. The intervals are, therefore, unequal, and increase as t becomes larger. 
In this table no values are given under n — 1 and n — 2, as these can be easily 
calculated from the ordinary trigonometrical tables by the formulae 


n = 1, 
n — 2, 


1 0 

p = - + - (where tan# = t), 

& 77 

I: sin 6 where tan 6 = j. 


Table III gives coefficients for calculating the difference between the value for 
n =oo, i.e. Sheppard’s table, and that for n , where p is arrived at by the formula 


p=p 

CO 


n n 2 n 2 


n' i ‘ 


This gives values of p, estimated to be accurate to 0-000005, when n is greater 
than 20, and, in fact, at 20 and 24 the following differences were found: 


Values of t 

0-5 

1-5 

2-0 

2*5 

30 

3-5 

4-0 

4-5 

5-0 

5-5 

6-0 

6-5 

7-0 

Differences 

k.=20 

0 

0 

0 

0 

+ 23 

+ 17 

-33 

-46 

-30 

+ 41 

+ 19 

+ 9 

+ 4 

Differences 

71 = 24 

— 

— 

— 

— 

+ 8 

+ 9 

-14 

-18 

- 4 

+ 20 

+ 7 

— 

— 


The above differences are in the seventh place of decimals, and are between 
values of p given by the approximation and those derived from the cosine 
formula using seven-place tables. Mr Fisher’s note (Metron, v (1925), pp. 109-12) 
explains the basis on which the coefficients were calculated. 

The methods of calculating and checking the tables were as follows: 

1. Values of p for 

t = 0-5, 1-0, 1-5, 2*0, 2*5, 3*0, 3*5, 4*0, 4*5, 5*0, 5-5, 6-0, 6*5, 7-0, 
and n= 1,2, 3, 4, 5, 6, 7, 8, 10, 12, 15, 20, 24, 

were calculated from the cosine formula ( Biometrika , vi, p. 10 [2, p. 21]), using 
seven-figure tables; these values, though they are the sum of \n terms, appear 
to be accurate within about 0*0000003, and were checked by recalculation. 
They were also compared with the values obtained by the use of Table III, 
which both served as a further check, and also to show within what limits 
Table III could be used for constructional purposes. 

2. From the values thus calculated under n = 6, 8, 12, 24, together with 
n =oo (i.e. Sheppard), the remaining frame values under n — 7, 9, 11, 13, 14, 15, 
16, 17, 18, 19 were interpolated by coefficients calculated by Mr Fisher for 
asymptotic interpolation. These were checked by recalculation and cross- 
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differencing, i.e. by comparing the difference p n ~p n ^. i with p n+1 —p n for the 
same values of t, and any doubtful values were recalculated by the cosine formula, 
as also were any values in which the fourth place of decimals was doqbtful, i.e. 
whenever the fifth place of decimals was 4 or 5. 

3. Having thus obtained a frame, this was filled in to five places of decimals 
in three ways: (a) by interpolation, using where necessary both four- and six-point 
central interpolation. It was found that over the greater part of the table the 
true values lie between the four-point and the six-point interpolation, but for 
high values of t it was usually sufficient to use four-point, six-point being required 
only to locate doubtful values. ( b ) For very low values of n{— 1,2 and 3) the 
frame was found not to be sufficiently close with low values of t, and alternate 
values had to be calculated from the cosine formula, the remaining odd values 
being interpolated by four- or six-point central interpolation, (c) As n increased 
it was found possible to make more and more use of Table III, beginning at 
n — 4 with values of t less than 1 and ending at n — 20 with the whole table. 
These values were recalculated as a check. Second differences were then taken 
down the columns and any doubtful figures checked from the cosine formula; 
as before, this was done whenever the fourth figure was in any doubt. 

Finally, the whole table was cross-differenced, and a very large number of 
values were recalculated from the cosine formula. Very few alterations were, 
however, found to be necessary. 

Table II was altogether calculated from the cosine formula; as it is designed 
to give an accuracy of four figures by proportional interpolation, it was possible 
to increase the interval between the t entries as t increases. 

Table III was calculated from Mr Fisher’s formulae, and I have to thank 
Miss W. A. Mackenzie, M.Sc., of the Rothamsted Statistical Laboratory for 
kindly checking this part of the work. 



118 New Tables for Testing the Significance of Observations 


Table I. The Probability Integral of t 


t 

n =1 
n ' =2 

2 

3 

3 

4 

4 

5 

5 

6 

6 

7 

7 

8 

8 

9 

9 

10 

10 

11 

00 

• 500,0 

• 500,0 

• 500,0 

• 500,0 

* 500,0 

• 500,0 

• 500,0 

• 500,0 

• 500,0 

• 500,0 

O'l 

• 531,7 

• 535,3 

• 536,7 

• 537,4 

• 537,9 

• 538,2 

• 538,4 

• 538,6 

• 538,7 

• 538,8 

0-2 

• 562,8 

• 570,0 

• 572,9 

• 574,4 

• 575,3 

• 576,0 

• 576,4 

• 576,8 

• 577,0 

• 517,3 

0-3 

• 592,8 

• 603,8 

• 608,1 

• 610,4 

• 611,9 

• 612,9 

• 613,6 

• 614,1 

• 614,5 

• 614,8 

04 

• 621,1 

• 636,1 

• 642,0 

• 645,2 

• 647,2 

• 648,5 

• 649,5 

• 650,2 

• 650,8 

• 651,2 

0-5 

• 647,6 

• 666,7 

• 674,3 

• 678,3 

• 680,9 

■ 682,6 

• 683,8 

• 684,7 

• 685,5 

• 686,1 

0-6 

• 672,0 

• 695,3 

• 704,6 

• 709,6 

• 712,7 

• 714,8 

• 716,3 

• 717,4 

• 718,3 

• 719,1 

0-7 

• 694,4 

• 721,8 

• 732,8 

• 738,7 

• 742,4 

• 744,9 

• 746,7 

• 748,1 

• 749,2 

• 750,1 

0-8 

• 714,8 

• 746,2 

• 758,9 

• 765,7 

• 770,0 

* 772,9 

• 775,0 

• 776,6 

• 777,8 

• 778,8 

0-9 

• 733,3 

• 768,4 

• 782,8 

• 790,5 

■ 795,3 

• 798,6 

• 801,0 

• 802,8 

• 804,2 

• 805,4 

1-0 

• 750,0 

• 788,7 

• 804,5 

• 813,0 

• 818,4 

• 822,0 

• 824,7 

• 826,7 

• 828,3 

• 829,6 

11 

• 765,1 

• 807,0 

• 824,2 

• 833,5 

• 839,3 

• 843,3 

• 846,1 

• 848,3 

• 850,1 

• 851,4 

1-2 

• 778,9 

• 823,5 

• 841,9 

• 851,8 

• 858,1 

• 862,3 

• 865,4 

• 867,8 

• 869,6 

• 871,1 

1-3 

• 791,3 

• 838,4 

• 857,8 

• 868,3 

• 874,8 

• 879,3 

• 882,6 

• 885,1 

• 887,0 

• 888,6 

14 

• 802,6 

• 851,8 

• 872,0 

• 882,9 

• 889,8 

• 894,5 

• 897,9 

• 900,5 

■ 902,5 

• 904,1 

1-5 

• 812,8 

• 863,8 

• 884,7 

• 896,0 

• 903,0 

• 907,9 

■ 911,4 

• 914,0 

• 916,1 

• 917,7 

1-6 

• 822,2 

• 874,6 

• 896,0 

• 907,6 

■ 914,8 

• 919,6 

• 923,2 

• 925,9 

■ 928,0 

• 929,7 

1-7 

• 830,7 

• 884,4 

• 906,2 

• 917,8 

• 925,1 

• 930,0 

• 933,5 

• 936,2 

• 938,3 

• 940,0 

1-8 

■ 838,6 

• 893,2 

• 915,2 

• 926,9 

• 934,1 

• 939,0 

• 942,6 

• 945,2 

• 947,3 

• 949,0 

1-9 

• 845,8 

■ 901,1 

• 923,2 

■ 934,9 

• 942,1 

• 946,9 

• 950,4 

• 953,0 

■ 955,1 

• 956,7 

2-0 

• 852,4 

• 908,2 

• 930,3 

• 941,9 

• 949,0 

■ 953,8 

• 957,2 

• 959,7 

■ 961,7 

• 963,3 

21 

• 858,5 

• 914,7 

• 936,7 

• 948,2 

■ 955,1 

• 959,8 

• 963,1 

• 965,5 

• 967,4 

■ 969,0 

2-2 

• 864,2 

• 920,6 

• 942,4 

• 953,7 

• 960,5 

• 964,9 

• 968,1 

■ 970,5 

• 972,3 

• 973,8 

2-3 

• 869,5 

• 925,9 

• 947,5 

• 958,5 

• 965,1 

• 969,4 

• 972,5 

• 974,8 

• 976,5 

• 977,9 

24 

• 874,3 

• 930,8 

• 952,1 

• 962,8 

• 969,2 

* 973,4 

• 976,3 

• 978,4 

• 980,1 

■ 981,3 

2-5 

• 878,9 

• 935,2 

• 956,1 

• 966,6 

• 972,8 

• 976,7 

• 979,5 

• 981,5 

• 983,1 

• 984,3 

2-6 

• 883,1 

• 939,2 

• 959,8 

• 970,0 

• 975,9 

• 979,7 

• 982,3 

• 984,2 

• 985,6 

• 986,8 

2-7 

• 887,1 

• 942,9 

• 963,1 

• 973,0 

• 978,6 

• 982,2 

• 984,7 

• 986,5 

• 987,8 

• 988,8 

2-8 

• 890,8 

• 946,3 

• 966,1 

• 975,6 

• 981,0 

• 984,4 

• 986,7 

* 988,4 

• 989,6 

• 990,6 

2-9 

• 894,3 

• 949,4 

• 968,7 

• 977,9 

• 983,1 

• 986,3 

• 988,5 

• 990,1 

• 991,2 

• 992,1 

30 

• 897,6 

• 952,3 

• 971,2 

• 980,0 

• 985,0 

• 988,0 

• 990,0 

• 991,5 

• 992,5 

• 993,3 

3-1 

• 900,7 

• 954,9 

• 973,4 

• 981,9 

• 986,6 

• 989,4 

• 991,3 

• 992,7 

• 993,6 

• 994,4 

3-2 

• 903,6 

• 957,3 

• 975,3 

• 983,5 

• 988,0 

• 990,7 

• 992,5 

• 993,7 

• 994,6 

• 995,3 

3-3 

• 906,3 ! 

• 959,6 

• 977,1 

• 985,0 

• 989,3 

• 991,8 

• 993,4 

• 994,6 

• 995,4 

• 996,0 

34 

• 908,9 i 

! 

• 961,7 

• 978,8 

• 986,4 

• 990,4 

• 992,8 

• 994,3 

• 995,3 

• 996,1 

• 996,6 

3-5 

• 911,4 

• 963,6 

• 980,3 

• 987,6 

• 991,4 

• 993,6 

• 995,0 

• 996,0 

• 996,6 

* 997,1 

3-6 

• 913,8 

• 965,4 

• 981,6 

• 988,6 

• 992,2 

• 994,3 

• 995,6 

• 996,5 

• 997,1 

• 997,6 

3-7 

• 916,0 

• 967,0 

• 982,9 

• 989,6 

• 993,0 

• 995,0 

• 996,2 

• 997,0 

• 997,5 

• 997,9 

3-8 

• 918,1 

• 968,6 

• 984,0 

• 990,4 

• 993,7 

• 995,5 

• 996,6 

• 997,4 

• 997,9 

• 998,3 

3-9 

• 920,1 

• 970,1 

• 985,0 

• 991,2 

• 994,3 

• 996,0 

• 997,1 

• 997,7 

• 998,2 

• 998,5 

4-0 

• 922,0 : 

• 971,4 

• 986,0 

• 991,9 

• 994,8 

• 996,4 

• 997,4 

• 998,0 

• 998,4 

• 998,7 

4-1 

• 923,9 

• 972,7 

• 986,9 

• 992,6 

• 995,3 

• 996,8 

• 997,7 

• 998,3 

• 998,7 

• 998,9 

42 

• 925,6 

• 973,9 

• 987,7 

• 993,2 

• 995,8 

• 997,2 

• 998,0 

• 998,5 

• 998,8 

• 999,1 

4-3 

• 927,3 

• 975,0 

• 988,4 

■ 993,7 

• 996,1 

• 997,5 

■ 998,2 

• 998,7 

• 999,0 

• 999,2 

44 

• 928,9 

• 976,0 

• 989,1 

• 994,2 

• 996,5 

• 997,7 

• 998,4 

• 998,9 

• 999,1 

• 999,3 

4-5 

• 930,4 

• 977,0 

• 989,8 

• 994,6 

• 996,8 

• 997,9 

• 998,6 

• 999,0 

• 999,3 

• 999,4 

4-6 

• 931,9 

• 977,9 

• 990,3 

• 995,0 

• 997,1 

• 998,2 

• 998,8 

• 999,1 

• 999,4 

• 999,5 

4-7 

• 933,3 

• 978,8 

• 990,9 

• 995,3 

• 997,3 

• 998,3 

• 998,9 

• 999,2 

• 999,4 

• 999,6 

4-8 

• 934,6 

• 979,6 

• 991,4 

• 995,7 

• 997,6 

• 998,5 

• 999,0 

• 999,3 

• 999,5 

• 999,6 

4-9 

■ 935,9 

• 980,4 

• 991,9 

• 996,0 

• 997,8 

• 998,6 

• 999,1 

• 999,4 

• 999,6 

• 999,7 

5-0 

• 937,2 

• 981,1 

• 992,3 

• 996,3 

• 997,9 

■ 998,8 

• 999,2 

• 999,5 

• 999,6 

• 999,7 

5-1 

• 938,4 

• 981,8 

• 992,7 

• 996,5 ‘ 

• 998,1 

• 998,9 

• 999,3 

• 999,5 

• 999,7 

• 999,8 

5-2 

• 939,5 

• 982,5 

• 993,1 

• 996,7 

• 998,3 

• 999,0 

• 999,4 

• 999,6 

• 999,7 

• 999,8 

5-3 

• 940,6 

• 983,1 

• 993,4 

• 997,0 

• 998,4 

• 999,1 

• 999,4 

• 999,6 

• 999,8 

• 999,8 

54 

• 941,7 

• 983,7 

• 993,8 

• 997,2 

• 998,5 

• 999,2 

• 999,5 

• 999,7 

• 999,8 

• 999,8 

5-5 

• 942,8 

• 984,2 

• 994,1 

• 997,3 

• 998,6 

• 999,2 

• 999,5 

• 999,7 

• 999,8 

• 999,9 

5-6 

• 943,8 

• 984,8 

• 994,4 

• 997,5 

• 998,7 

• 999,3 

• 999,6 

• 999,7 

• 999,8 

• 999,9 

5-7 

• 944,7 

• 985,3 

• 994,6 

• 997,7 

• 998,8 

■ 999,4 

• 999,6 

• 999,8 

• 999,9 

• 999,9 

5-8 

• 945,7 

• 985,8 

• 994,9 

• 997,8 

• 998,9 

• 999,4 

• 999,7 

• 999,8 

• 999,9 

• 999,9 

5-9 

• 946,6 

• 986,2 

• 995,1 

• 997,9 

• 999,0 

• 999,5 

• 999,7 

• 999,8 

• 999,9 

• 999,9 

6-0 

• 947,4 

• 986,7 

• 995,4 

• 998,1 

• 999,1 

• 999,5 

• 999,7 

• 999,8 

• 999,9 

• 999,9 
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n =11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

bo 


n ' —12 

13 

14 

15 

16 

17 

18 

19 

20 

21 



• 500,0 

• 500,0 

• 500,0 

• 500,0 

• 500,0 

• 500,0 

• 500,0 

• 500,0 

• 500,0 

• 500,0 

• 500 , 000,0 

00 

• 538,9 

• 539,0 

• 539,1 

• 539,1 

- 539,2 

• 539,2 

• 539,2 

• 539,3 

• 539,3 

• 539,3 

• 539 , 827,8 

0-1 

• 577,4 

• 577,6 

• 577,7 

• 577,8 

• 577,9 

• 578,0 

• 578,1 

• 578,1 

• 578,2 

• 578,2 

• 579 , 259,7 

0-2 

• 615,1 

• 615,3 

• 615,5 

• 615,7 

• 615,9 

• 616,0 

• 616,1 

• 616,2 

• 616,3 

• 616,4 

• 617 , 911,4 

0 '3 

• 651,6 

• 651,9 

• 652,2 

• 652,4 

• 652,6 

• 652,8 

• 652,9 

• 653,1 

• 653,2 

• 653,3 

• 655 , 421,7 

0-4 

• 686,5 

• 686,9 

• 687,3 

• 687,6 

• 687,8 

• 688,1 

• 688,3 

• 688,4 

• 688,6 

• 688,7 

• 691 , 462,5 

0-5 

• 719,7 

• 720,2 

• 720,6 

• 721,0 

• 721,3 

• 721,5 

• 721,8 

• 722,0 

• 722,2 

• 722,4 

• 725 , 746,9 

0-6 

• 750,8 

• 751,4 

• 751,9 

• 752,3 

• 752,7 

• 753,0 

• 753,3 

• 753,6 

• 753,8 

• 754,0 

• 758 , 036,3 

0-7 

• 779,7 

• 780,4 

• 781,0 

• 781,5 

• 781,9 

• 782,3 

• 782,6 

• 782,9 

• 783,2 

■ 783,4 

• 788 , 144,6 

0-8 

• 806,3 

• 807,1 

• 807,8 

• 808,3 

• 808,8 

• 809,3 

• 809,7 

• 810,0 

• 810,3 

• 810,6 

• 815 , 939,9 

0-9 

• 830,6 

• 831,5 

• 832,2 

• 832,9 

• 833,4 

• 833,9 

• 834,3 

• 834,7 

• 835,1 

■ 835,4 

• 841 , 344,7 

10 

• 852,6 

• 853,5 

• 854,4 

• 855,1 

• 855,7 

• 856,2 

• 856,7 

• 857,1 

• 857,5 

• 857,8 

• 864 , 333,9 

11 

• 872,3 

• 873,4 

• 874,2 

• 875,0 

• 875,6 

• 876,2 

• 876,7 

• 877,2 

• 877,6 

• 877,9 

• 884 , 930,3 

1-2 

• 889,9 

• 891,0 

• 891,9 

• 892,7 

• 893,4 

• 894,0 

• 894,5 

• 895,0 

• 895,4 

• 895,8 

• 903 , 199,5 

1-3 

• 905,5 

• 906,6 

• 907,5 

• 908,4 

• 909,1 

• 909,7 

• 910,3 

• 910,7 

• 911,2 

• 911,6 

• 919 , 243,3 

H 

• 919,1 

• 920,3 

• 921,2 

• 922,1 

• 922,8 

• 923,5 

• 924,0 

• 924,5 

• 925,0 

• 925,4 

• 933 , 192,8 

1’5 

• 931,0 

• 932,2 

• 933,2 

• 934,0 

• 934,8 

• 935,4 

• 936,0 

• 936,5 

• 937,0 

• 937,4 

• 945 , 200,7 

1-6 

• 941,4 

• 942,6 

• 943,5 

• 944,4 

• 945,1 

• 945,8 

• 946,3 

• 946,8 

■ 947,3 

• 947,7 

• 955 , 434,5 

1*7 

• 950,3 

• 951,5 

• 952,5 

• 953,3 

• 954.0 

• 954,6 

• 955,2 

• 955,7 

• 956,1 

• 956,5 

• 964 , 069,7 

1*8 

• 958,0 

• 959,1 

• 960,1 

• 960,9 

• 961,6 

• 962,2 

• 962,7 

• 963,2 

• 963,6 

• 964,0 

• 971 , 283,4 

1*9 

• 964,6 

• 965,7 

• 966,6 

• 967,4 

• 968,0 

• 968,6 

• 969,1 

■ 969,6 

• 970,0 

• 970,4 

• 977 , 249,9 

2-0 

• 970,2 

• 971,2 

• 972,1 

• 972,8 

• 973,5 

• 974,0 

• 974,5 

• 975,0 

• 975,3 

• 975,7 

• 982 , 135,6 

21 

• 975,0 

• 975,9 

• 976,8 

• 977,4 

• 978,1 

• 978,6 

• 979,0 

• 979,4 

• 979,8 

• 980,1 

• 986 , 096,6 

2-2 

• 979,0 

• 979,9 

• 980,7 

• 981,3 

• 981,9 

• 982,4 

• 982,8 

• 983,2 

• 983,5 

• 983,8 

• 989 , 275,9 

2-3 

• 982,4 

• 983,2 

• 984,0 

• 984,6 

• 985,1 

• 985,5 

• 985,9 

• 986,3 

• 986,6 

• 986,9 

• 991 , 802,5 

2-4 

• 985,2 

• 986,0 

• 986,7 

• 987,3 

• 987,7 

• 988,2 

• 988,5 

• 988,8 

• 989,1 

• 989,4 

• 993 , 790,3 

2-5 

• 987,7 

• 988,4 

• 989,0 

• 989,5 

• 990,0 

• 990,3 

• 990,7 

• 991,0 

• 991,2 

• 991,4 

• 995 , 338,8 

2-6 

• 989,7 

• 990,3 

• 990,9 

• 991,4 

• 991,8 

• 992,1 

• 992,4 

• 992,7 

• 992,9 

• 993,1 

• 996 , 533,0 

2-7 

• 991,4 

• 992,0 

• 992,5 

• 992,9 

• 993,3 

• 993,6 

• 993,8 

• 994,1 

• 994,3 

• 994,5 

• 997 , 444,9 

2-8 

• 992,8 

• 993,3 

• 993,8 

• 994,2 

• 994,5 

• 994,5 

• 995,0 

• 995,2 

• 995,4 

• 995,6 

• 998 , 134,2 

2-9 

• 994,0 

• 994,5 

• 994,9 

• 995,2 

• 995,5 

• 995,8 

• 996,0 

• 996,2 

• 996,3 

• 996,5 

• 998 , 650,1 

30 

• 994,9 

• 995,4 

• 995,8 

• 996,1 

• 996,3 

• 996,6 

• 996,7 

• 996,9 

• 997,1 

• 997,2 

• 999 , 032,4 

3 1 

• 995,8 

• 996,2 

• 996,5 

• 996,8 

• 997,0 

• 997,2 

• 997,4 

• 997,5 

• 997,6 

• 997,8 

• 999 , 312,9 

3-2 

• 996,5 

• 996,8 

• 997,1 

• 997,4 

• 997,6 

• 997,7 

• 997,9 

• 998,0 

• 998,1 

• 998,2 

• 999 , 516,6 

3-3 

• 997,0 

• 997,4 

• 997,6 

• 997,8 

• 998,0 

• 998,2 

• 998,3 

• 998,4 

• 998,5 

■ 998,6 

• 999 , 663,1 

3-4 

• 997,5 

• 997,8 

• 998,0 

• 998,2 

• 998,4 

• 998,5 

• 998,6 

• 998,7 

• 998,8 

• 998,9 

• 999 , 767,4 

3-5 

• 997,9 

• 998,2 

• 998,4 

• 998,6 

• 998,7 

• 998,8 

• 998,9 

• 999,0 

• 999,0 

• 999,1 

• 999 , 840,9 

3-6 

• 998,2 

• 998,5 

• 998,7 

• 998,8 

• 998,9 

• 999,0 

• 999,1 

• 999,2 

• 999,2 

• 999,3 

• 999 , 892,2 

3-7 

• 998,5 

• 998,7 

• 998,9 

■ 999,0 

• 999,1 

• 999,2 

• 999,3 

• 999,3 

• 999,4 

• 999,4 

• 999 , 927,7 

3-8 

• 998,8 

• 998,9 

• 999,1 

• 999,2 

• 999,3 

• 999,4 

• 999,4 

• 999,5 

• 999,5 

• 999,6 

• 999 , 951,9 

3-9 

• 999,0 

• 999,1 

999,2 

999,3 

999,4 

• 999,5 

999,5 

999,6 

• 999,6 

• 999,6 

• 999 , 968,3 

4-0 

• 999,1 

• 999,3 

• 999,4 

■ 999,5 

• 999,5 

• 999,6 

• 999,6 

• 999,7 

• 999,7 

• 999,7 

• 999 , 979,3 

41 

• 999,3 

• 999,4 

• 999,5 

• 999,6 

• 999,6 

• 999,7 

• 999,7 

• 999,7 

• 999,8 

• 999,8 

• 999 , 986,7 

4-2 

• 999,4 

• 999,5 

• 999,6 

• 999,6 

• 999,7 

• 999,7 

• 999,8 

• 999,8 

• 999,8 

• 999,8 

• 999 , 991,5 

4-3 

• 999,5 

• 999,6 

• 999,6 

• 999,7 

• 999,7 

• 999,8 

• 999,8 

• 999,8 

• 999,8 

• 999,9 

• 999 , 994,6 

4-4 

• 999,5 

■ 999,6 

• 999,7 

• 999,8 

• 999,8 

• 999,8 

• 999,8 

• 999,9 

• 999,9 

• 999,9 

■ 999 , 996,6 

4-5 

• 999,6 

• 999,7 

• 999,8 

• 999,8 

• 999,8 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999 , 997,9 

4-6 

• 999,7 

• 999,7 

• 999,8 

• 999,8 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999 , 998,7 

4-7 

• 999,7 

• 999,8 

• 999,8 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999 , 999,2 

4-8 

• 999,8 

• 999,8 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

1 - 000,0 

1 - 000,0 

• 999 , 999,5 

4-9 

• 999,8 

• 999,8 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

1 - 000,0 



• 999 , 999,7 

5-0 

• 999,8 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

1 - 000,0 




• 999 , 999,8 

51 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

1 - 000,0 





• 999 , 999,9 

5-2 

• 999,9 

• 999,9 

• 999,9 

• 999,9 

1 - 000,0 






• 999 , 999,9 

5-3 

• 999,9 

• 999,9 

• 999,9 

1 - 000,0 







1 - 000 , 000,0 

5-4 

• 999,9 

• 999,9 

• 999,9 









5-5 

• 999,9 

• 999,9 

1 - 000,0 









5-6 

• 999,9 

1 - 000,0 










5-7 

• 999,9 











5-8 

• 999,9 











5-9 

1 - 000,0 











6-0 


Note. n=n'— 1 is the number of degrees of freedom used in the estimate of variance. 
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Table II. Supplementary table for high values of t 


t 

n =3 
n' =4 

11 II 

n = 5 
n' =6 

n =6 
n' =7 

n = 7 
n' =8 

n— 8 
n' = 9 

II II 

o 

71 —10 
n' =11 

6-0 

6 - 5 

7 - 0 

7 - 5 

8 - 0 

• 995,364 

• 996,303 

• 997,007 

• 997,544 

• 997,962 

• 998,059 

• 998,555 

• 998,904 

• 999,155 

• 999,338 

• 999,077 

• 999,357 

• 999,542 

• 999,754 

• 999,518 

• 999,684 

• 999,788 

• 999,898 

• 999,729 

• 999,833 

• 999,894 

• 999,954 

• 999,838 

• 999,906 

• 999,944 

• 999,965 

• 999,899 

• 999,968 

• 999,934 

■ 999,966 

8 - 5 

9 - 0 
10-0 
110 
12-0 

• 998,290 

• 998,552 

• 998,936 

• 999,196 

• 999,377 

• 999,578 

• 999,719 

• 999,862 

• 999,859 

• 999,915 

• 999,965 

■ 999,947 

• 999,971 





14-0 

16-0 

20-0 

24-0 

28-0 

• 999,605 

• 999,735 

• 999,863 

• 999,921 

• 999,950 

• 999,924 

• 999,955 








Linear interpolation between adjacent entries will give fonr figure accuracy. 


Table III 


t 

Ct 

c . 

<?3 

C ,* 

t 



c 3 

c 4 

01 

• 010 , 023,1 

-• 001,261 

_ 

• 001,55 

+ - 000,4 

3-1 

• 026 , 862,2 

+ • 207,289 

+ • 028,49 

- 1 - 368,5 

0-2 

• 020 , 334,2 

-• 002,616 

— 

• 003,08 

• 000,8 

3-2 

• 021 , 437,7 

• 193,351 

* 144,23 

- 1 - 610;8 

0-3 

• 031 , 178,5 

-• 004,177 

— 

• 004,53 

• 001,3 

3-3 

• 016 , 897,1 

• 176,859 

• 254,58 

- 1 - 837,2 

0-4 

• 042 , 719,3 

-• 006,087 

— 

• 005,86 

• 001,7 

3-4 

• 013 , 155,2 

• 158,774 

• 353,27 

- 2 - 034,8 

0-5 

• 055 , 010,2 

-• 008,509 

— 

• 006,97 

• 002,2 

3-5 

• 010 , 117,7 

• 139,969 

• 435,35 

- 2 - 732,8 

06 

• 067 , 977,8 

-• 011,595 

_ 

• 007,72 

• 002,6 

3-6 

• 007 , 687,9 

• 121,313 

• 497,49 

- 2 - 503,1 

0-7 

• 081 , 420,2 

-• 015,432 

— 

• 007,96 

• 002,8 

3-7 

• 005 , 772,2 

• 103,371 

• 538,13 

- 2123,7 

0-8 

• 095 , 018,8 

-• 019,991 

— 

• 007,74 

• 002,6 

3-8 

• 004 , 282,3 

• 086,649 

• 557,33 

- 1 - 625,7 

0-9 

• 108 , 363,2 

-• 025,066 

— 

• 006,51 

• 002,1 

3-9 

• 003 , 139,7 

• 071,486 

• 556,67 

- 1 - 050,1 

1-0 

• 120 , 985,4 

-• 030,246 

- 

• 005,04 

• 001,1 

4'0 

• 002 , 275,1 

• 058,066 

• 538,80 

- - 441,7 

11 

• 132 , 399,7 

-• 034,907 

_ 

• 003,76 

• 000,4 

4-1 

• 001 , 629,5 

• 046,453 

• 507,08 

+ - 155,5 

12 

• 142 , 144,2 

-• 038,248 

— 

• 003,75 

• 001,1 

4-2 

• 001 , 153,8 

• 036,613 

• 465,19 

• 702,6 

1-3 

• 149 , 819,0 

-• 039,363 

— 

• 006,56 

• 005,8 

4-3 

• 000 , 807,4 

• 028,438 

• 416,81 

1 - 169,2 

1-4 

• 155 , 117,7 

-• 037,344 

— 

• 014,10 

• 018,2 

4 4 

• 000 , 558,6 

• 021,773 

• 365,31 

1 - 535,3 

1-5 

• 157 , 849,6 

-• 031,399 

— 

• 028,41 

• 043,0 

4-5 

• 000 , 382,1 

• 016,436 

* 313,56 

1 - 791,6 

1-6 

• 157 , 951,2 

-• 020,971 

_ 

• 051,29 

• 084,9 

4-6 

• 000 , 258,4 

• 012,235 

• 263,86 

1 - 938,7 

1-7 

• 155 , 486,7 

-• 005,832 

— 

• 083,84 

• 161,2 

4-7 

• 000 , 172,8 

• 008,984 

• 217,87 

1 - 985,3 

1-8 

• 150 , 637,0 

+ • 013,846 

— 

• 126,09 

• 232,3 

4-8 

• 000 , 114,3 

• 006,508 

• 176,63 

1 ' 946,2 

1-9 

• 143 , 682,2 

• 037,483 

*— 

• 176,64 

• 335,1 

4-9 

• 000 , 074,6 

• 004,652 

• 140,70 

1 - 839,4 

20 

• 134 , 977,5 

• 064,114 

— 

• 232,56 

• 446,6 

5-0 

• 000 , 048,3 

• 003,281 

• 110,17 

1 - 683,9 

21 

• 124 , 924,4 

• 092,473 

— 

• 289,45 

• 551,2 

5-1 

• 000 , 030,9 

• 002,284 

• 084,85 

1 - 498,6 

2-2 

• 113 , 944,4 

• 121,099 

— 

• 341,85 

• 627,7 

5-2 

• 000 , 019,5 

• 001,570 

• 064,29 

1 - 299,5 

2*3 

• 102 , 451,8 

• 148,472 

— 

• 383,79 

• 652,2 

5-3 

• 000 , 012,2 

• 001,065 

• 047,96 

1 - 100,4 

2-4 

• 090 , 832,2 

• 173,147 

— 

• 409,50 

■ 601,0 

5-4 

• 000 , 007,6 

• 000,713 

* 035,27 

• 910,7 

2-5 

• 079 , 425,1 

• 193,877 

— 

• 414,17 

• 455,3 

5-5 

■ 000 , 004,6 

• 000,472 

* 025,47 

• 379,9 

2-6 

• 068 , 512,4 

• 209,710 

! — 

• 394,59 

+ - 205,6 

5-6 

• 000 , 002,8 

• 000,308 

• 018,15 

• 506,5 

2-7 

• 058 , 312,9 

• 220,052 

— 

• 349,67 

- - 145,4 

5-7 

• 000 , 001,7 

■ 000,199 

• 012,74 

• 437,0 

2-8 

• 048 , 980,8 

• 224,693 

— 

• 280,61 

- - 580,9 

5-8 

• 000 , 001,0 

• 000,127 

• 008,82 

• 349,5 

2-9 

• 040 , 609,7 

• 223,749 

— 

• 190,82 

- 1 - 070,4 

5-9 

• 000 , 000,6 

• 000,080 

• 006,02 

• 262,6 

3-0 

• 033 , 238,9 

■ 217,715 

~ 

• 085,59 

- 1 - 572,7 

6-0 

• 000 , 000,3 

• 000,050 

• 004,05 

• 193,9 
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MATHEMATICS AND AGRONOMY* 

\_J. Amer. Soc. Agron. XVIII (1926), p. 703] 

The nature of pure mathematics is such that the conclusions follow inevitably 
from the premises and may be said to be contained in them. Consequently, if in 
applying mathematics to affairs we reach absurd conclusions, we may be sure 
either that a blunder has been made or that in some essential point the data of 
the mathematical problem did not correspond to the facts. 

For it must be remembered that mathematical analysis deals with abstractions 
and that commonly the abstractions chosen are very much more simple than the 
facts, either in order to secure a generalized result, or because the analysis would 
otherwise become too difficult. 

Thus, even in the ordinary textbook problem we may have to deal with 
weightless ropes or frictionless pulleys, with basins which empty through the 
waste at a uniform speed regardless of the depth, or bricklayers who work at the 
same rate, however closely they may be crowded together. 

General Considerations 

It may be assumed then that if mathematical analysis applied to the inter¬ 
pretations of agronomic experiments has given absurd or inconsistent results, it 
is probably because the facts were not correctly represented by the abstractions 
with which the mathematics dealt. It may, therefore, be worth while to consider 
what limitations are imposed by the imperfect correspondence between the 
conditions of our experiments and the mathematical abstractions from which 
are constructed the tables which are used to interpret their results. It may also 
be possible to find means of designing experiments so that they may be inter¬ 
preted with as little error as possible. 

I shall begin by setting out, as far as may be, in non-mathematical language, 
the reasons which lead us to use certain tables in interpreting our experiments, 
and then examine the conditions under which we are justified in doing so. But 
however much it may be desired to avoid mathematical language, it is necessary 
to define a certain number of terms accurately, and in what follows the following 
words will be used in the sense given below: 

1. Variable . A quantity that can present more than one numerical value, 
e.g. height, birth-rate, the yield of a plot. 

* Personal contribution. Received for publication 26 March 1926. I should like to thank 
Prof. J. H. Parker and Dr H. Hunter, who kindly suggested that I should write this paper, 
Dr R. A. Fisher, “Mathetes”, and several other friends who have helped to clear up the 
obscurities of the original manuscript. 
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2. Variate. An individual value of a variable, e.g. 5 ft. 10 in.; 19-63 per 1000; 
1911b. 

3. Population. All the individuals under discussion. It should be noted that 
all these individuals need not exist. We may be dealing with a population of all 
individuals which could have existed under certain conditions. A population may, 
and generally does, vary in more than one character. It is necessary to be quite 
sure exactly what is the population with which we are dealing, and to remember 
that our conclusions cannot necessarily be extended to other populations. If, for 
example, we have a series of plots from which we deduce that one variety of oats 
will give a higher yield than a second, and all the experiments were carried out in 
an exceptionally dry summer, our population would be ‘Comparisons of yields in 
an exceptionally dry summer”, and without further work it is obviously impossible 
to draw general conclusions applicable to comparisons of yields in all summers. 

4. Sample. A number of individuals selected to represent a population. 

5. Random Sample. A sample selected in such a way that any individual in 
the population has an equal chance of being included in the sample. It is always 
difficult, and often impossible, to discover anything definite about a population 
from a sample which is not random. 

6. Frequency. The number of variates occurring between any limiting values 
of a variable. 

Clearly, it is possible to give a geometrical representation of the frequencies 
occurring in any sample by setting out the scale of its variable horizontally along 
a base line and measuring vertically the frequency on each unit of the scale. 
This gives the familiar figure consisting of columns of equal width but of unequal 
height which is known as a histogram. 

If the sample is small we may have such a figure as this: 


Each square represents one variate; but if the sample is larger, it will take a 
more continuous form, such as this: 
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As the numbers increase, the tendency is for the outline to become more and 
more regular. 

It should be noted that, in practical affairs, the columns of the histogram 
must necessarily have a definite width, that of the unit of measurement ; this 
must be at least the width of the smallest measurable unit of the variable and 
is usually much wider. Thus, although weight can be measured in fractions of a 
gramme, the yields of plots are given to the nearest pound or cental. 

Nevertheless, we can imagine that if the unit of measurement were to be 
decreased indefinitely and the sample increased without limit so as to become 
an infinitely large population, the histogram with its irregular steps would be 
replaced by the smooth continuous curve which is known as a frequency curve. 
These frequency curves are necessarily abstractions; nobody ever reached one 



by plotting out the frequency of a sample, but it is often comparatively easy by 
following the instructions of mathematicians who have studied the subject to 
find the equation and draw the graph of a frequency curve which describes a 
population such that a given sample might have been drawn from it by random 
selection. While frequency curves are of many types, the only one to which 
attention need here be drawn is that discovered by Gauss and La Place, and 
known variously by their names and as the “Probability Curve” or “Normal 
Curve of Error”. This curve was reached by supposing that the error of an 
observation is the sum total of an infinite number of infinitely small components 
each of which may be either positive or negative, and it purports to give the 
frequency with which errors of any given magnitude occur. The following 
properties are of interest: 

1. It is symmetrical about a middle vertical line—the mean. 

2. The curve is completely determined if we know the total frequency which 
it represents, i.e. the area between the curve and the base line, the mean, and 
either {a) the average of the squares of the distances of the errors from the mean, 
the mean square of error—called by R. A. Fisher the Variance—or ( b) the 
average distance of the errors from the mean—the mean error.* 

* In view of the fact that some American writers have stated, that it does not much matter 
whether the probable error be calculated from the mean square (Bessel’s formula) or the 



124 


Mathematics and Agronomy 

3. The square root of the mean square of error is called the “ Standard Error” 
or “the Standard Deviation”, s.d. 

4. 0-6745 times the standard deviation is called the “Probable Error”, and 
is such that in this special type of curve one-half of the errors lie within a distance 
of once the probable error on either side of the mean. It should be noted that 
apart from this normal curve of error, the probable error has no exact meaning. 

5. Tables giving the area of the curve lying between any given error, x, and 
either the mean or — oo have been constructed. In these “#” is measured either 
in terms of the standard deviation or of the “Modulus” c (c = the S.D. x <J2) and 
the area as the fraction of the total area of the curve. 

Since an unknown observation may fall with equal probability in any equal area 
of the curve , these tables can be used to calculate the odds against an observation falling 
beyond any required distance from the mean. 

6. Many naturally occurring populations may be described very closely by a 
normal curve of frequency, and can then be determined by the total frequency, 
the mean, and the s.d. 

7. Although many populations exist which cannot be described by this curve, 
the samples which we are able to obtain in agronomic work are generally too 
small for us to be sure that the population they represent is not normal. 

8. Even in the case of samples drawn from a population admittedly not 
normal, the means of such samples belong to a population (of means) which 
becomes more and more nearly normal the larger the samples. 

It is, therefore, usual to assume, and in the case of large samples the assumption 
can be made without appreciable error, that the published tables of the normal 
curve can be used to calculate the, odds against the mean of the sample differing 
by more than any required amount from the mean of the population. 

It should here be remarked that in order to be able to use the tables in this 
way there must be a unit of measurement of the variation (standard deviation, 
probable error, or modulus of error), and there are two ways in which this can 
be arrived at. 

The first way is that used by astronomers, routine analysts, and such people 
as can repeat observations many times in a standard manner. Working in this 
way, they can find a value of the s.d. from some hundreds of determinations of 
the same quantity, and they can then use this figure for smaller numbers of 
determinations in subsequent experiments. 

The second way is more usual. It is to calculate the s.d. of the sample and 
use this value instead of the s.d. of the population. This has the advantage that 

mean error (Peter’s formula), it may be as well to state categorically that it does matter. 
R. A. Fisher (in Monthly Notices , Roy. Astron. Soc. June 1920) has shown that the lattei 
method is equivalent to wasting 12 % of the observations, since 100 cases treated by the 
first give as accurate a measure of the probable error as 114 cases treated by the second. 
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at all events there can be no possibility of using the s.d. of the wrong population, 
which, conceivably, might otherwise happen. But, on the other hand, very few 
series of experiments are sufficiently long to allow of an exact estimate of the 
s.d. being made. 

For the s.d. determined from a sample is just as much subject to error as is 
the mean, and consequently, if x is to be measured in terms of the s.d. of the 
sample, the uncertainty of the conclusion is necessarily increased. Further, it 
does not follow that the frequency curve of means of samples when measured 
in this new unit, which is different for each sample, will any longer be found to 
approximate to the “normal” curve. In fact, it has been shown not to do so for 
small samples, and Student’s tables* have been constructed to meet the particular 
case of small samples drawn from a population which is itself normally distributed. 

This survey of the foundations on which the application of probability to 
affairs are based has doubtless seemed long and, I fear, tedious; yet even so, it 
cannot be regarded as more than the merest sketch, and I shall be fortunate if 
it is even considered accurate by those entitled to an opinion on the subject. 
Nevertheless, we are now in a position to judge how far it is appropriate to use 
the two sets of tables, i.e. those of the normal curve, typified by Sheppard’s, 
and Student’s, in the interpretation of agronomic work. 

Application to Agronomy 

It may be assumed that the object of all agronomic experiments is to find out 
whether some change of practice is likely to benefit farmers who follow it. The 
change is commonly of manure or of seed, but sometimes of method of operation. 
In order to simplify matters, it is proposed in the first place to deal with a change 
typified by the replacement of one variety of cereal seed by another. 

Taking this simple case, the following points must be borne in mind when 
using the tables in order to judge of the significance of conclusions: 

1. The population to which the conclusions are to be applied is one of yields 
of cereals grown in fields on the large scale. That being so, the population of which 
the experiments are to be a sample must not differ in any essential point from this, 
and in particular must be coextensive with the possible large-scale population. 

Thus, if it is desired to estimate the result of replacing variety A by Variety B 
over an area part of which is affected by drought and part not, the experiments 
must be spread over land subject to both sets of conditions, and even then it is 
best to regard them as belonging to two separate populations. 

Similarly, in a variable climate (and where does the climate not vary?), the 
experiments must be carried over a series of years to correspond to that population 
of large-scale practice which is spread over the future. 


* [See pp. 29, 62-3 and 118-20 above. Ed.] 
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Again, there is a disproportionate amount of border in any reasonable size of 
experimental plot. This border must either be in contact with another variety 
or with ground unoccupied by crop. In either case the yield of the border strip 
is liable to be different from that of the interior, so that if the results are to be 
applied to the large-scale population of which the border forms a negligible 
fraction, it must be rejected. 

Lastly, as far as may be, large-scale methods of agriculture should be used. 
Granted that it is often not possible, there is a danger that results may not be 
applicable to the farmer’s case every time this principle is departed from, and 
every result obtained by small-scale methods should be rigorously checked on 
the large scale before making recommendations to the farmer. 

2. Generally speaking, but not necessarily, the population of large-scale yields 
with which we are concerned is a population of “differences”, i.e. some such 
question as the following is asked: “By how much may we expect the yield of 
variety B to exceed that of variety A if they were sown alternatively on the same 
soil in the same season? ” 

That being so, it is clear that the observed differences will not represent the 
true differences even in the sample plots as two crops cannot occupy the same 
place at the same time. Observed differences will miss the mark not only because 
the experimental soil and the weather experienced by the experiment may not 
be random samples of the soil and weather to be explored, but also because the 
actual plots laid out for the two varieties will usually differ in fertility. This is 
one of the largest sources of errors in field experiments. 

Nevertheless, we are still dealing with a sample of differences and it is clearly 
advantageous in this simple case to do all calculations in terms of differences.* 

This is not to say that percentages should never be used; that is another method 
of substituting one figure for two which has its uses, but percentages should be 
used with the greatest care, they are fertile mothers of fallacy. 

3. In using either of the tables we assume that the experimental results are 
a sample drawn from a population distributed normally. This is doubtless very 
often nearly true, but the limited number of experiments usually prevents us 
from being sure of it. What, then, is the extent of the uncertainty arising from 

* Note that the formula connecting the s.d. of a difference with those of its components 
is cr A _ B = cr A + cr| — 2r AB cr A (r B> where cr A _ B is the s.d. of A — B and so on, and r AB is the 
correlation between A and B. Only if r AB = 0 does this degenerate into what I may call the 
astronomer’s formula: 2 _ a 2 

& A -B “ & A ~T' Cr B' 

In any well-planned experiment r AB is high, and there is considerable advantage in cal¬ 
culating the odds according to the correct formula. By considering the differences at once, 
we avoid all this difficulty of correcting for correlation. 

In some American work the taking of differences seems to be considered the essential point 
of what they are kind enough to call “Student’s Method’’, but this old artifice must at least 
date back to Noah, who doubtless had occasion to estimate the comparative appetites of 
his male and female passengers. 
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this cause? The answer is, that if we have enough data no appreciable error is 
introduced, since even if the population is not normal the distribution of the 
means of large samples is very nearly so, but with very few repetitions we have 
to fall back on the general experience that such frequencies as those of yields 
are generally not badly represented by the normal curve, and hope for the best. 
F ortunately, the approach to normality of the distribution of the means of samples 
is very rapid, and appreciable errors are not likely to arise from this assumption, 
if we are dealing with the mean of more than a dozen repetitions. 

4. Even supposing that an assumption of normality is justifiable, Student’s 
tables must be used in calculating from small samples the probability that the 
results could have occurred by chance. To use the other method is definitely 
wrong, especially as it gives too high an estimate of the reliability of the results. 

5. That being so, the only object in calculating the probable error in such 
cases is to compare with other experiments. Even for this purpose it is necessary 
with small samples to divide by n — 1 and not by n to reach the mean square. 
But indeed “probable errors” derived from only two or three cases are so subject 
to chance that it is somewhat doubtful whether any useful purpose is served 
by calculating them. For example, if 10 were the value of the “probable error” 
of a population and values were to be found from samples of two or three, only 
49 % of the values would lie between 5 and 15 in the case of samples of two and 
but 68 % in the case of samples of three. 

The use of n~ 1 as divisor is also necessary in calculating a standard deviation 
from a large number of small samples of size n, which H. K. Hayes* proposes 
bo call the “Deviation from the Mean Method”. 

It is necessary to remember that the correct formula to use is 

/(S(d*)xn\ 

vhere d is the deviation from the mean of the sample and N is the total number 
)f deviations. When n is quite small this correction makes an appreciable 
inference. 

6. Frequency curves are reached by assuming an infinitely large population 
ind an infinitely small unit of measurement, and there is no trouble in imagining 
m infinitely large population though we have only to deal with the finite sample 
>efore us. But the unit of measurement must be the same for both, and, therefore, 
lot only not infinitely small but as large as is convenient or customary. This is 
mother of the discrepancies between the facts and the mathematics which does 
lot matter very much as long as the samples are large , but may make a good deal 
>f difference when they are small. With small samples the unit of measurement 
hould be quite small compared with the difference which is being measured. 

• “ Control of soil heterogeneity and use of the probable error concept in plant breeding 
fcudies”: Minn. Agric . EoopU Sta. Tech. But. 30 (1925). 
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For example, Student’s table has been made to give absurd results by supposing 
that all the values happened by chance to coincide, when the odds became 
infinite. The probability that results should have the same value is not negligible 
when the unit of measurement is large but becomes vanishingly small as the 
unit of measurement is decreased, until, in the limit, the infinite odds only occur 
infinitely seldom. Nevertheless, when the repetitions are few and very high odds 
are obtained by the use of Student’s tables, it is well to consider whether the 
result is not due to a value of the s.d. having occurred which is much smaller 
than usual, and if this seems likely, to discount the apparent certainty accordingly. 
The tables are calculated to give the odds correctly if all the available information 
is contained in the sample. If additional information is available, such as that 
the s.d. of similar experiments is usually larger, we are quite entitled to draw 
attention to it, even though it may not be possible to introduce it into the 
calculations. In fact, tables can only be an aid to, and not a substitute for, 
common sense. 

7. The experiments must be capable of being considered to be a random 
sample of the population to which the conclusions are to be applied. Neglect of 
this rule has led to the estimate of the value of statistics which is expressed in 
the crescendo “lies, damned lies, statistics”. 

Well-conducted experiments can often be supposed to give results which are 
random samples of the population of possible differences between the yields of 
plots sown with varieties A and B which could be grown on the experimental 
area under climatic conditions similar to those of the season in which the experi¬ 
ments were carried out, but it must be confessed that in some cases it is only 
by courtesy that experiments can be considered to be a random sample of any 
population. In such cases the greatest care must be exercised in drawing 
conclusions. 

Nevertheless, we need not go as far as S. C. Salmon,* who says: “It is with 
this source of error (soil heterogeneity) that Student’s method may entirely 
fail”, and proceeds to illustrate this by a comparison of yields in a tillage experi¬ 
ment carried out on two plots over a period of ten years. With all respect, I dc 
not think Salmon credits the user of the method with common sense. For he 
supposes that as the result of this comparison it will be concluded that the facl 
that one of the plots gave significantly higher yields than the other will be pui 
down to the tillage treatment. 

A moment’s consideration shows, however, that the population from whicl 
the sample was drawn is a sample of differences of yield between these two plot 
in all possible seasons; whereas, considered as a sample of difference in yield du< 
to tillage treatment in all soils similar to that experimented in, it is only one cas< 
from which, of course, no definite conclusions can be drawn either by Student’; 

* J. Amer. Soc. Agron. xvi (1924), pp. 717-21. 
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method or any other. If, however, ten repetitions had been made with an 
arrangement of the plots which could he considered random , the population sampled 
would have been that of “all similar soils”, and the error introduced by soil 
heterogeneity would have been weighed and allowed for by the use of the tables.* 

8. To sum up, the experiments must he conducted in such a way that their results 
may he capable of being considered to be a random sample of the population to which 
the conclusions are to apply. The unit of measurement must be small compared 
with the differences likely to be found, and the replications must be sufficient 
(a) to give significance to the mean difference, and (6) to give a sufficiently close 
estimate of the variability to enable us to measure that significance with accuracy. 

And here it may be pointed out that in some cases, could we but know the 
variability accurately, very few experiments would be required to demonstrate 
significance. If, to take an extreme example, a difference of ten units is found 
between a single pair of experiments and it is known from other work that in 
this case the s.d. of a single difference is likely to be in the neighbourhood of 
two units, a considerable, though somewhat indefinite, degree of confidence 
could be reposed in the result. This leads me to suggest that a careful tabulation 
and examination of s.d.’s of experiments conducted at each station might be 
very valuable as showing within what limits the s.d. of a new experiment might 
be expected to lie, and what sort of weight might be given to a result which would 
otherwise lack significance owing to want of knowledge of the variability. Useful 
though this might be, it is clearly better to arrange the experiments so that we 
shall have sufficient replications to lead to significance without going beyond 
the experiment itself. 

Elsewheref I have drawn attention to Beaven’s half-drill strip method of 
comparing two varieties of cereals—a method which seems to me to fulfil the 
necessary requirements when but two varieties are in question. Here I propose 
to deal shortly with B. A. Fisher’s “Latin square” arrangement of experimental 
plots. This arrangement is calculated to reduce and allow for the error introduced 
by soil heterogeneity and is suitable for work on any scale from rod rows or small 
rectangular plots up to large-scale plots of all sizes, provided always that the 
borders of small plots are discarded or that there is room enough for large plots. 

: Fisher outlines the method of the Latin square on pp. 229 to 232 of his book 

* As Hayes ( loc . tit.) has also complained that when comparing different seasons’ yields 
Student’s method does not allow for soil heterogeneity, I should like to emphasize that it 
nay be used to estimate the uncertainty due to the season or to the soil heterogeneity, or 
wen to both, provided we are satisfied that the experiments may be considered to belong to 
i single population. To compare mere average yields in different seasons and then to 
jomplain that no account has been taken of soil heterogeneity is as if a man were to feed 
vheat into a mill and then complain that the resulting meal “had entirely failed to make 
rnten porridge”. 

f Biometrika, xv (1923), pp. 271 et seq. [11]. 
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on Statistical Methods for Research Workers (Messrs Oliver and Boyd, Edinburgh), 
and bases it on the following principles: 

1. If there are contributory sources of variation which are all independent , the 
variance of the whole will be the simple sum of the variance contributed by all 
the sources. As mentioned above, Fisher defines variance as the square of the 
standard deviation, or in the case of errors, as the mean square of error. We may 
therefore, for example, be able to analyse a total variance into (a) that part 
contributed by the varieties (of seed or culture) having different yields; ( b) that 
part contributed by say an East to West heterogeneity of soil; (c) that part 
contributed by say a North to South heterogeneity of soil; and (d) a random 
effect of soil heterogeneity not included in (6) and (c), which are not random. 

2. It is possible to arrange n plots of each of n different varieties in a square* 
so that each row and each column of the square contains one plot of each variety, 
but that otherwise the arrangement is “random”. Having done so, we can 
estimate variances (a), (b) and ( c ), whence by subtracting their sum from the 
total variance of the n 2 plots, we can estimate variance (d), which is now the 
only one which affects the comparison between the varieties. 

Fisher’s justification of his method might perhaps be considered to come under 
the head of mathematics, which we have agreed to avoid, so assuming its 
correctness we may proceed to illuminate the subject by the consideration of 
a simple example. 

Let us suppose that we are to test four varieties (A, B, C and D) of a cereal 
by sowing four plots of each in a Latin square. We have to arrange the 16 plots 
so that each row and column of the square contains one of each of the varieties, 
and yet the arrangement is otherwise to be random. We first proceed to draw 
a diagram with four rows and four columns to represent the 16 experimental 
plots. By suitably allocating four faces of a die, we can throw to find out which 
variety shall occupy the left top comer. Let us suppose B. We then proceed 
along the top row, throwing a die each time, and get C and A. The fourth must 
be D. Next, the left-hand column is suitably filled in by D, C, A. Note wher 
there are only three possibilities two faces can be allocated to each variety and 
when only two, three faces. 

The intersection of the second row and column can now only be filled by A 
or J5, and a throw of the die makes it A. The intersection of the second row anc 
third column may be B or (7, and we find C, the last of the second row is therefore 

D 

B, and the last column must be B or there would be two C’s in the third row. 

A 

C 

* The actual shape of the Latin square will be similar to that of one of its eonstituen 
plots and may therefore be only diagrammatically square. This is quite immaterial to th 
argument. 
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Finally, the intersection of the third row and second column may be B or D , 
and a throw of the die makes it B } which fixes the remaining three places. 


This, which was actually arrived at by die throwing, is one of the 288 possible 
arrangements, and we may further use it for purposes of illustration by supposing 
that the yields were those of the S.E. corner of Montgomery’s* diagram of plots 
ofTurkey wheat given on p. 37 ofhis classical “Experiments in Wheat Breeding”. 

The yields in grammes are as follows: 


(a) Variance of Columns 

Taking 2600 as a working mean, the deviations of sums of columns from this are 

- 107 and the squares 11,449 
+ 187 34,969 

+186 34,596 

+ 211 44,521 


Deduct £(477) 2 


125,535 

56,882-25 


68,652-75 +4* = 17,163-1875 

* Divide by 4 because we have worked with totals and we want to change to means. 


(b) Variance of rows 

As above, deviations of sums of rows from 2600 are 

+ 261 and squares 
+ 4 

+ 29 
+ 183 


102,467 

Deduct £(477) 2 56,882-25 

45,584-75 +4 = 11,396*1875 

* U.S. Dept. Agric. Bur. Plant Indust. Bui. 269. 
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(c) Variance of varieties 

Deviations of sums of varieties from 2600 are 


Total 

Deduct 


- 5 and squares 

25 

+ 243 

59,049 

+ 66 

4,356 

+ 173 

29,929 

477 

93,359 

1(477)* 

56,882-25 

36,476-75 +4 = 9,119-1875 


(d) Total Variance 

Taking 680 as a working mean the deviations of the yields of the individual plots are as iollows: 


Yield of plot-680 

Squared 

- 63 

3,969 

+ 3 

9 

+ 46 

2,116 

+ 155 

24,025 

- 76 

6,084 

- 18 

324 

- 40 

1,600 

+ 20 

400 

- 15 

225 

+ 56 

3,136 

- 50 

2,500 

- 82 

6,724 

- 71 

5,041 

+ 26 

676 

+ 110 

12,100 

- 2 

4 

- 3 

68,933 

Deduct t V{3) 2 



68,932-4375 


We have next to perform an operation analogous to that of multiplying by 
<sl{n/(n- 1)} in the ease of finding the s.d, from a sample of n, Fisher’s way of 
doing this is given below in the table of the analysis of the variance. 


Variance 
due to 

Degrees of 
freedom 

Sum of 
squares 

Variance 

Standard 

deviation 

Varieties 

Columns 

Rows 

Remainder 

3 

3 

3 

6 

9,119-19 

17.163-19 

11,396-19 

31,253-87 

5,208-94 

72-2 

Total 

15 

68,932-44 




In the above table the first column is descriptive of the variance arising from 


different sources. 

We are chiefly concerned with that entitled "Remainder”, which enables us 
to arrive at an estimate of the random errors which are not associated either 
with variety or with that part of soil heterogeneity common to whole rows or 
columns. The second column gives the "Degrees of freedom”. In the first, 
second, third and fifth rows of the table the degrees of freedom merely represent 
one less than the number in the sample (4 varieties, 4 columns, and 16 plots 
altogether) and are strictly analogous to then-l quoted above. The number 
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in the fourth row is obtained by making the first four rows add up to the total 
of the fifth. 

The principle of degrees of freedom is widely applied by Fisher, and the idea 
behind it is that if there are a number n of variates of which the mean is used 
in the calculation, all but one of them can take any possible value; but when 
n—l values have been chosen the last one is fixed by the mean, so that only 
n-l variates are free to vary. If, in addition, some other statistic is used, such 
as the s.d., only n — 2 of them can be varied, and so on. 

In this case there are fifteen degrees of freedom in the total and three sets of 
three degrees of freedom are taken up by the varieties, the rows, and the columns, 
leaving six for the determination of the variance of the random error. 

In the third column the first, second, third and fifth rows are the sums of 
squares calculated in (c), (a), (b) and (d) above, and the fourth is found by 
making the first four rows add up to the last. 

In the fourth column the required variance is given by dividing the figure 
in the third column by that in the second and from this is obtained the s.d. by 
extracting the square root. If this had been found from enough degrees of 
freedom, we could find the s.d. of the difference between two varieties appropriate 
to use with tables of the normal curve by dividing by (the 2 in the denominator 
being due to the fact that we are to judge of the significance of a difference, and 
the 4 to the number of replications), which would give a s.d. of about 50, while 
the greatest difference between “varieties” is only about 60. Obviously, this 
would not be significant, as indeed m this example it should not be, the “ varieties ” 
being all the same—Turkey Red wheat. In fact the significance is even less, as 
with only six degrees of freedom Student’s table must be used.* 

Unfortunately, Student’s tables were constructed some time before the Latin 
square was thought of, and it requires some care to enter the table aright. 

In the first place we have to enter the table under the heading n = 7, one 
more than the degree of freedom, since if Student’s table had been headed with 
the degrees of freedom, the headings would have been one less. 

Secondly, to obtain z, we divided the difference (say B — A which is 62) by 
[2 x /7 

the s.d. x 2 —-j^ in which the *J2 corresponds to the fact that we are considering 

a difference, the that the original table was constructed so as 

to give the probability for means of 7, while we only have means of 4. 2 is here, 
therefore, y^g- or just under 0*5, which if looked out in this table under n = 7 
gives P = 0-86, a satisfactorily non-significant result. 

* This applies to the tables given in Biometrika , vi, p. 19 and xi, p. 416; and those in the 
new edition of Tables for Statisticians and Biometricians. The tables in Fisher’s Statistical 
Methods for Research Workers and those which are to appear in the next number of Metron 
are given under the headings of the degrees of freedom. [The Biometrika and Metron tables 
are those printed on pp. 29, 62-3 and 118-20 of this volume. Ed.] 
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Looked at from another point of view we should require a difference between 
varieties of not less than 100 grammes, or some 15 %, for it to be worth while 
testing under such conditions as Montgomery’s with only four plots of each 
variety. 

I have illustrated the method on a Latin square of four plots per side, choosing 
a small number so as to make it easy to follow the arithmetic, but in point of 
fact four replications are decidedly too few and much larger squares are recom¬ 
mended. One of the disadvantages of this particular illustration has been that 
whereas usually the variance is much reduced by the subtraction of that associated 
with rows and columns, there has by chance been very little reduction in this case. 

The following table gives other possibilities: 





Number of 

Heading of 


Number of 

Number of 

Total 

number 

degrees of 
freedom for 

column to 
be used in 

Factor to multiply 

varieties 

replications 

of plots 

calculation 

Student’s 

s.d. by in calculating z* 



of error 

tables 


4 

4 

16 

6 

7 

s/m 

5 

5 

25 

12 

13 


6 

6 

36 

20 

21 

v/(T) 

7 

7 

49 

30 

_ 


Use normal curve with 

4 

16 

64 

46 

— 


S - D - X \/(ie-at) 


f Number of replications. J Three less than the number of replications. 


In the last case there will be two replications in each row and column, and care 
must be taken that the arrangement is really random, e.g. if one plot of A has 
been fixed in a row the chance of filling the next with A must be only half that 
of filling with one of the other letters not yet represented in the row. 

Conclusion 

To sum up, in planning agronomic experiments use plenty of replications and 
make quite sure that your results are capable of being considered to be a random 
sample of the population about which you wish to draw conclusions. 

* [It should be remembered that this is the z of Student’s original notation [2, p. 17 
above] and not the quantity defined by R. A. Fisher and now generally used in the 
analysis of variance. Ed.] 
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ERRORS OF ROUTINE ANALYSIS 

[Biometrika, XIX (1927), p. 151] 

Introduction. Dr E. S. Pearson, Biometrika , xvm, p. 192, has given the 
moment coefficients of the distributions of range in small samples drawn from 
the normal population when the number in the sample lies between 2 and 6. 
Mr L. H. C. Tippett, Biometrika , xvn, pp. 364—87, had already provided similar 
data for samples of 10, 20 and 60, but Dr Pearson gives improved values in his 
Table VIII which I have used. 

These constants provide a means of drawing curves which approximate closely 
to the actual frequency curves of the distribution of ranges, apparently sufficiently 
closely for us to use their integrals as probability integrals for the occurrence of 
ranges of fairly large size. 

Thus the real frequency curve for range in samples of two is known to be a half 
normal curve of standard deviation ^/2. <r, whereas the Pearson curve found from 
the moments is a Type I with equation 


y 


543-062 


/ x \°* 569 / x \ 6,569 
\ + 0-574/ \ ~6ffi23/ ' 


If they be drawn on the same scale, Fig. 1, we see that for the greater part of 
the way the two curves are practically identical. 

Assuming then, as seems likely, that the approximation in the case of the 
larger samples is even closer than in the case of samples of two, we have here a 
means of determining the probability of occurrence of ranges of given size in the 
case of quite small samples, assuming as always a normal population. Now it is 
just in the case of these small samples that most of the tests which have been 
proposed for the rejection of observations fail; there is no possibility of finding the 
true mean of the population. Mr J. 0. Irwin, Biometrika , xvn, pp. 238-50, has, 
it is true, proposed to use Galton’s differences for this purpose, but on the other 
hand, there are cases in which the true standard deviation of the population is 
known with some approach to accuracy, and it seemed in such cases Dr Pearson’s 
work should enable us to reject determinations so widely spread as to render the 
occurrence of the observed range unlikely to any specified degree. 

Happening to mention to Dr Pearson that I proposed to apply his work to the 
rejection and repetition of analytical results, he suggested that the readers of 
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BiometriJca might be interested both in the application and, indeed, in a descrip¬ 
tion of the errors of routine analysis from which the necessity of rejection arises. 

In endeavouring to fall in with this suggestion, I propose to set out, firstly, 
what routine analyses are, and to what sort of errors they are liable, secondly, 
the advantages that accrue from a statistical examination of these errors; and, 
lastly, the bearing of Dr Pearson’s paper on the vexed question of the repetition 
and rejection of results. 

At the outset I may state that, though no analyst, I have been in close touch 
for some years with a routine laboratory, the authorities of which have very kindly 
supplied me with some of their results for the purpose of the present paper. 



• Pearson’s Approximation: Type I curve 

/ ' x \ 0-569 / x \ 6*569 

y =■0*543^1 ] [f “6^623; 

- Actual Curve for Comparison: Half of normal curve 

1 


2o ■sJtt 


e~ 4tf3 
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Routine Analysis, The difference between research and routine is fundamental 
to the scope of the present paper, and it lies in the relation between the analyst 
and his work rather than in the actual process of the analysis. Thus, what is at 
one time a research involving concentrated thought and watchfulness on the part 
of the analyst may, later on, become the merest routine; every step known and 
prepared for in advance, and requiring not the resourcefulness of the chemist of 
high degree but the machine-like accuracy of the well-trained assistant. 

This is not to say either that research may not make use of routine processes, 
as it frequently does, or that routine processes may not form the subject of 
research, as they constantly should; yet, broadly speaking, we are not concerned 
with the distinguished chemist who determines the atomic weight of an element 
to n places of decimals, and has theories about the value of the (n + l)th-—it would 
be impertinence to talk of errors in such a connexion. 

No, we are going to deal with the chemist who has to make similar analyses 
day after day and year after year; with, for example, the public analyst who 
provides evidence to convict the milkman of watering his milk, and the grocer of 
sanding his sugar; with the works chemist, who maybe spends his whole life in 
determining the acidity or alkalinity of solutions; or again, with the assayer, who 
must find out which of innumerable samples of ore are payable. 

There is often enough little or no scientific interest in such determinations, 
yet their practical value is in the aggregate enormous; the application of science 
to industry would without them be all but impossible. 

These people are not so much troubling themselves about the wth place of 
decimals; their problem is to get results as quickly and as cheaply as possible; 
quickly, because events may be waiting upon them, and cheaply for reasons that 
need hardly be elaborated. 

They must, however, attain sufficient accuracy for the purpose in hand, which 
is generally concerned with the third figure rather than the fourth, and is often 
enough satisfied with the second. Nevertheless without this minimum of accuracy 
the analysis is worthless, so that the chemist in charge of the laboratory has to 
make himself very sure that it is reached. 

Obviously he cannot be sure, unless he has made some determinations of the 
error, and he can only reduce his error if he has a working knowledge of the 
sources of error. 

Sources of Error . The first of these, very often the chief of them, is not strictly 
a laboratory error; it arises from the difficulty of obtaining a sample in a bottle 
which shall represent perhaps some tons, or even hundreds of tons, of material. 
This difficulty of sampling provides a convenient excuse for discordant results, but 
the wise chemist will see to it that the sample is drawn in a manner which will 
rob this excuse of any appreciable validity. And that is by no means easy: but 
the errors of commercial sampling do not fall within the scope of this paper, so 
I do not propose to say more about them here. 
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Nor do I propose to deal with the allied problem of subsampling the sample 
which has been received for analysis. This may be in the case of solids quite a 
difficult matter, and can lead to appreciable error unless a suitable technique is 
employed. 

After this, each operation of the analysis contributes its error; I am told that 
the standard error of weighing on a balance is about one in two thousand; all 
analyses involve at least two weighings and there are often more. Then we have 
such things as titration, generally contributing quite a small error; transfer of 
material from one vessel to another; digestion at a uniform temperature; filtration, 
incineration, and so forth; all these add their quota. 

These errors are not necessarily symmetrical, some of them involve loss of 
material, and for this reason a chemist will sometimes prefer the higher of two 
results. 

Perhaps a description of a very simple analysis may illustrate the kind of thing 
that happens. Let us suppose that it is required to estimate the percentage of 
moisture in a sample of grain, not as part of a research but for the commercial 
valuation of a large bulk in a ship or warehouse; it will very likely be one of a 
number of analyses the results of which will be required by the next day. 

First, the sample is subsampled and a weighed portion of the ground-up 
material is put into an oven on a small tray. The oven is kept at a constant 
temperature for a fixed number of hours, the tray is then removed, cooled over 
concentrated sulphuric acid and quickly weighed. The loss of weight is taken to 
be the moisture present in the weighed quantity which was put into the oven. 

Here we have the errors of subsampling, grinding, two weighings, and of 
driving off moisture by heat; hardly any one of these operations is as simple as it 
sounds. The grinding, for example, whether done in a mill or with a pestle and 
mortar, leaves material on the grinding surfaces; this material is not the same as 
the bulk but is composed of the finer or more adhesive part of it. It is, therefore, 
necessary to grind and throw away a small quantity before dealing with the 
portion which is to be weighed. Then we have the fact that organic matter 
exposed to the atmosphere, generally if not always, tends to get into equilibrium 
with the moisture in the air, hence both grinding and weighing must be done rapidly. 
/ When in the oven the loss of weight will depend not only on the exact tempera¬ 
ture and time, but on the ventilation of the oven and the number of samples in it. 
Nor is all the loss necessarily moisture, carbon dioxide may either be formed and 
lost by oxidation, or be lost by splitting off from some already oxidized compound. 
We may even get the estimation too low owing to an increase of weight due to 
absorption of oxygen. 

Of course, in a research one would work in an atmosphere of nitrogen and 
weigh the moisture absorbed by phosphorus pentoxide, determining one sample 
at a time and weighing at intervals until the weight became constant, but routine 
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analysis has neither time nor money for this: it has to rely on keeping the con¬ 
ditions constant. The most it can do is to check an occasional result by the more 
lengthy method. 

All this sounds as if the results would be very inaccurate, yet it is not so The 
moisture of grain, lying between 10-20 %, can be determined with a standard 
deviation measured in percentage moisture of about 0-2, or 1 part in 500. 
JSIaturally, different laboratories, using different ovens set up under different 
conditions do not necessarily agree with one another, but they will probably agree 
to this order of accuracy in their relative estimates when comparing different 
samples, and that is usually what is required. 

We now come to a phenomenon which will be familiar to those who have had 
astronomical experience, namely that analyses made alongside one another tend 
to have similar errors; not only so but such errors, which I may call semi-constant, 

tend to persist throughout the day and some of them throughout the week or the 
month. 

Why this is so is often quite obscure, though a statistical examination may 
enable the head of the laboratory to clear up large sources of error of this kind: it 
is not likely that he will eliminate all such errors. 

The chemist who wishes to impress his clients will therefore arrange to do 
repetition analyses as nearly as possible at the same time, but if he wishes to 
lmirnsh his real error he will separate them by as wide an interval of time as 
possible. Here are some examples: 

• tT a qUantlty of material was taken, mixed as well as possible and stored 
in Winchester bottles. Samples were taken from these and analysed daily between 
the beginning of April and the end of August—100 in all. This, though statistically 
speaking a small sample, represents an amount of work which a routine chemist 
will not easily be persuaded to undertake. 

At each analysis seven items were determined and of these I have now 
examined five: all are troubled to a greater or less extent by semi-constant errors, 
as is most easily shown by a comparison of twice the variance of a single analysis 
with once that of the difference between consecutive observations: if the arrange¬ 
ment were random they would of course be the same within the error of random 
sampling. 


Table I 


Item 

Twice 

variance 

Variance of 
difference 

Correlation 
between con¬ 
secutive analyses 

I 

2-20 

1-60 

+ 0-27 

2 

0-625 

0-434 

+ 0-31 

3 

0-0748 i 

0-0606 

+0-19 

4 

0-171 

0-157 

+ 0-09 

5 

5-42 J 

4-68 

— -----_ 

+0-09 
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Of course, not all of these correlation coefficients are individually significant 
ut they are illustrative of a general phenomenon. I do not recollect having met 
with a case where the correlation was negative. 

The two top lines of dots in Fig. 2 give the individual analyses of items 2 
and 3, the latter of which gives the percentage of moisture in the samples. The 
lines across the diagrams show the mean values of these. 

I will now give another case of a routine analysis repeated in a time series. 
Here, as a check on the accuracy of the estimation of nitrogen by the Kieldahl 
me od, a determination of the nitrogen in pure crystalline aspartic acid was made 
about once a week from 1903 to the present time. The method is a standard one 
for the determination of “amino ” nitrogen in organic matter. A weighed quantity 
of the substance to be analysed is digested in strong sulphuric acid which destroys 
the organic matter and converts the nitrogen into ammonium sulphate. Excess of 
alkali is then added and the nitrogen distils over in the form of ammonia and is 

caught m a measured quantity of acid, where it is estimated by titration of the 
excessive acid with deci-normal soda. 

Of course the amount of nitrogen in a crystalline substance can be calculated 
within narrow limits and the third row in Fig. 2 gives the calculated (as a straight 
line) and the actual (as spots) since 29 April 1924 up to the end of 1926 
At first the results were all too low, but the details of the process were under 
examination and the later estimates have risen and the variance has decreased 
owing to improvements which have been effected: simultaneously, the time taken 
as een reduced by half. For about six months before the beginning of last 
November the results were remarkably good; one could have calculated the 
atomic weight of nitrogen from the mean with an accuracy which would hardly 
aye disgraced research, hut there has since been a falling off, the average of the 
last seven bemg rather over 1 % too low. This illustrates the sort of difficulty which 
arises in routine analysis, for no one is conscious of any alteration in method, nor 
nas a close search revealed the cause of the change. 

The error statistics which I have cited up to the present have all been obtained 
by the laboratory m the course of investigation into, and control of, its error 
1 am now going to give some figures taken from some published analyses which 
seem to me to show that similar “semi-constant ” errors probably exist in another 
laboratory; it would surprise me to find any laboratory without them, but it is 
outy by chance that they become apparent unless they are deliberately sought for 
. Ah ? an f y® es are published in the Report on the Sugar Beet Experiments 1925 
issued and distnbuted without charge by the Department of Agriculture of the 
Irish Free State. 

These experiments were conducted at 424 farms, all the twenty-six counties 
emg represented, and the complete programme, consisting of two plots of each 
ot tour varieties, one top dressed with nitrate of soda and one not, was successfully 
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carried out in 163 cases, in another 190 it was found necessary to top dress all the 
plots, and the remaining 71 cases fell through for one reason or another. It is 
the 163 complete results with which I propose to deal. 

It will be seen that each farm produced eight different lots of beet and as each 
of these was analysed to find the percentage of sugar we can average the figures to 
get the percentage of sugar for the farm. Further, the date on which the analyses 
of each farm were carried out is given in the report, and in Fig. 3 are given the 
averages of analyses made on the same day as central points with lines extending 
upwards and downwards showing the extent of twice the standard deviation of 
the mean of the number of analyses, ranging from 8 (one farm) to 96 (twelve farms), 



a = average of a farm, 

a — mean of a day’s analyses, 

n=. number of farms analysed in the day. 


where 





Errors of Routine Analysis 143 

It is obvious from an inspection of the figure that there was a distinct rise of 
sugar between the beginning and end of November, which is doubtless due to the 
gradual maturing of the roots, but it is not easy to account for the marked dip 
shown by the analyses carried out on the 5th, 7th and 8th of December, on any 
other supposition except that of laboratory error. 

The thirteen farms, the produce of which was analysed on those dates, were in 
five counties, so that the roots were sent up by five different men and may be 
considered a random sample of the material to be analysed in the early part of 
December. It has been suggested that the loss of sugar was due to the action of 
frost on the roots before they were drawn from the ground or whilst in transit 
from the farms to the State Laboratory. From inquiries which I have made I am 
satisfied that the lower sugar content is not attributable to such action for such 
frosts as were experienced did not apparently affect the leaves, let alone the roots, 
and the packing of the beet to ensure its arrival in fresh condition at the laboratory 
obviated any possibility of freezing in transit. 

I have also been informed that beetroots lose sugar when they are clamped. 
I am assured, however, that none of the samples to which the report relates was 
pitted or clamped but that each sample of roots was washed, topped or crowned 
and dispatched to the laboratory immediately after being taken out of the ground. 
The roots were forwarded by passenger train so as to secure quick transit, were 
unpacked immediately before the analysis was commenced and, as a rule, the 
analysis of a sample was completed within twenty-four hours of its receipt in the 
laboratory. It seems likely, therefore, that the low results were due to errors of a 
similar nature to those which were observed in the other laboratory. 

To embark on a long series of analyses in order to determine error is always a 
considerable undertaking and is often impossible owing to the tendency of organic 
substances to change with time: added to this, unless special precautions are 
taken, such as were taken in 1905, the operators may, in spite of themselves, be 
more careful when analysing special samples of this kind, so that the series may 
not represent a random sample of analytical errors. 

It is convenient, therefore, to take advantage of the fact that important 
analyses are often repeated as part of the routine and to calculate the standard 
deviation of the error from the differences between pairs by simply dividing the 
variance of the differences by 2 and taking the square root. 

I give in Table II the standard deviations of errors of the items 1 to 5, the 
variance of which I gave before, but having in addition further determinations 
made from the differences between 100 pairs analysed in 1925 and in 1926. 

The standard error arrived at in this way is that of analyses made within a 
comparatively short period of time and does not take account of the variation of 
the instantaneous mean ” which we have just been observing. It is therefore the 
correct measure of the error if we wish to compare such analyses with each other 
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Table II 


Item 

s.d. 1905 

Error differences between pairs 

1905 

1925 

1926 

1 

1-048 

0-895 

0-731 

0-660 

2 

0-559 

0-466 

0-386 

0-523 

3 

0-193 

0-174 

0-138 

0-152 

4 

0-293 

0-280 

0-326 

0-272 

5 

1-640 

1-570 

2-810 

2-120 


but is too small if the analyses were separated by a wide interval. On the other 
hand, the standard error derived from 100 analyses spread over three months is 
too large when we are dealing with the differences between consecutive analyses. 
The difficulty can only be removed by reducing the secular variation to negligible 
limits. 

Perhaps it would be well to illustrate this point in some further detail. Suppose 
a merchant to be offered two samples of grain at the same price: as far as he can 
judge they are of equal value but he is uncertain whether the moisture is the same. 
He gets them analysed and is returned the figure 14 % for sample A , and 15 % for 
sample B. If the standard deviation of the error is 0-2 % clearly he should 
purchase A; if the error were 4 % it would not much matter which he bought. 
But observe in this case, as in many others, he is really only concerned with the 
difference between A and B, and if he controls the analysis he will get them done 
alongside each other so as to avoid their being affected by semi-constant error, 
and the error of the analysis will be about that found from 100 differences. 

On the other hand, suppose he has bought a cargo of grain and an analysis tells 
him that the moisture is 17 % while it is common knowledge that 17-5 % is the 
highest moisture at which grain will keep. Here he is not concerned with relative 
but with absolute values and the error now includes the semi-constant error, so that 
the value deduced from the 100 analyses spread over a long time is the better. 

As a sort of corollary of the existence of semi-constant errors in the same 
laboratory, we find that different laboratories have different constant errors, and 
a wise man will always consult the same analyst and not be troubled overmuch 
if a second analyst does not exactly agree with him. 

I have now, I hope, shown that routine analyses are subject to errors of which 
it behoves the head of the laboratory to be well aware. He may then judge 
whether his analyses are sufficiently accurate to bear the weight of any actions 
which it may be proposed to base upon them, and if not, how many repetitions 
will suffice to make them so; he will realize that an analysis made elsewhere is not 
necessarily less valuable than his own because it does not agree absolutely with it, 
and he will be in a better position to set about improving the details of his 
methods than if he were ignorant of the magnitude of his errors. 
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I now turn to the particular point raised by Dr Pearson’s paper. It will be 
realized from what has gone before that important analyses may have to be 
repeated and the same applies of course to those which have given results at 
variance with a priori expectation. Very important results may even have to be 
repeated more than once, and it is only natural to regard these pairs—triplets or 
quartets—with suspicion if the results are not “concordant”, i.e. have a wide 
range. 

The result is that there is a tendency to make further repetition in such cases, 
to reject discordant results, and to accept the mean of the remaining observations: 
all the same this instinctive distrust of width of range needs some justification, 
and, if justified, some rules for repetitions. 

For if the error were normally distributed there would be no advantage in 
rejection; this follows from the fact that in normal distributions there is no corre¬ 
lation between the square of the mean and the variance: similarly, in platykurtic* 
distributions those samples with large variance even tend to have the more accurate 
means. Actually, however, many if not most routine analyses have a leptokurtic 
error system, possibly because the standard deviation as well as the mean is 
subject to variation with time, and in such cases rejection of outlying observations 
improves the accuracy of the mean; apart from this we are all fallible and the 
procedure takes account of blunders. 

* In case any of my readers may be unfamiliar with the term “kurtosis ” we may define 
mesokurtic as “having /? 2 equal to 3”, while platykurtic curves have /? 2 < 3 and leptokurbic 



> 3. The important property which follows from this is that platykurtic curves have shorter 
4 tails ” than the normal curve of error and leptokurtic longer “tails I myself bear in mind 
he meaning of the words by the above memoria technica, where the first figure represents 
datypus, and the second kangaroos, noted for “lepping”, though, perhaps, with equal 
•eason they should be hares! 


BPS 


io 
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The following table gives the values of /? 2 for samples of the five items of 
analysis which I have given before: 

Table III 


Item 

100 analyses of 
1905 

Differences be¬ 
tween conse¬ 
cutive analyses 
of 1905 

Differences 
between 100 
pairs in 

1925 

Differences 
between 100 
pairs in 

1926 

1 

31 

2-7 

4*8 

51 

2 

3-5 

2-9 

8*2 

7-4 

3 

2-3 

2-6 

2-9 

3-2 

4 

2-9 

2-7 

10-4 

16-2 

5 

10-0 

5-5 

5-0 

71 


In this table the differences between the /? 2 ’s of twenty years ago and those of 
the present day are rather remarkable, and though with small samples such as 
these the standard deviation of /? 2 is enormous I should hesitate to assert that 
they are due to random sampling; I am inclined to think that there has probably 
been a twofold change, (1) that the error of the great majority has decreased, and 
(2) that possibly owing to work being carried on at higher pressure there is a 
rather greater liability to blunders. In this way the standard deviation remained 
much the same but the kurtosis has increased. Be that as it may, the tendency 
to leptokurtosis is apparent and repetitions justified except in the case of No. 3, 
which, as I mentioned before, indicates moisture. Here the kurtosis of the 
difference between pairs is approximately “meso” while that of the 100 analyses 
appears to be distinctly platykurtic; this is in accordance with another distribu¬ 
tion of moisture determinations which I have examined. 

Why this should be I have no idea, but obviously if a normal error were 
superposed on an instantaneous mean which moves to and fro on, let us say, a sine 
curve, the resulting distribution would be platykurtic: something of this sort may 
have happened. 

Assuming, however, that discordant observations are to be repeated and if 
necessary rejected, it is obviously of advantage to work on a regular system, 
and since we do not know where the mean is I propose to use the range as 
follows: 

Let W n be the lim it at which with a sample of n, the chance of obtaining a 
greater range than W n is p (say 0*05), then if w n the actual range of a sample be 
greater than W n repetition should be made. Let w n+1 be the range of the nev 
sample including the repetition, then if w n+1 < W n+1 the mean of the n + 1 results 
should be accepted. If, on the other hand, w n+1 > W n+1 the most outlying observa¬ 
tion should be rejected, and if then the resulting w n <W n the mean of these % 
should be accepted, but if not, a further repetition should be made and the whole 
n +2 observations examined afresh, and so on until a sample of at least n it 
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obtained lying within the required limits. For example, we may have a quartet 
of analyses 
22 * 8 ^ 


23-5 

26-0 

26*6 


the values of W n for this analysis (s.d. 0*675) being as follows 


fTF 4 = 2-4. 
TF 5 - 2*6. 
F 6 = 2*7. 
Jf 7 = 2*8. 

Here w± = 3*8, so we repeat and get 23*9. Then w 5 ~ 3*8 and we reject 22*8 
leaving w i — 3*1. We therefore repeat again getting 25*5. Then we have w 6 — 3*8, 
w 5 — 3*1 (rejecting 22*8) and w 4 = 2*5 (rejecting 26*6). Still another repetition 
gives 25*0 and we reject in turn 22*8, 26*6 and 26*0, leaving 23*5, 23*9, 25*0 and 
25*5, with a range of 2*0 and an average of 24*5 which we accept. 

To obtain W n> the curves giving the frequency distributions of range for samples 
taken from a normal population were drawn from Pearson’s constants and the 
limits at which p is 0*1, 0*5 and 0*02 were determined. This gives us limits for 
samples of 2, 3, 4, 5, 6, and between 6-10 we can interpolate with the aid of 
Tippett’s values for 10, 20 and 60. W n is of course given in terms of the standard 
error calculated from samples of analyses such as I have instanced above. 


Table IY 



II 

o 

p^ 0*05 

p =0*02 

W 2 

2-3 

2*9 

3*3 

w 3 

2-9 

3*4 

3*8 

w t 

3-2 

3*6 

4*1 

W 5 

3-4 

3*8 

4*3 

w. 

3-7 

4-0 

4*5 

w 7 

3*7 

4*1 

4*5 

W 3 

3*8 

4*2 

4*6 

w 9 

3*9 

4*3 

4*7 

W l0 

4*1 

4*5 

4*9 


Fig. 4 gives a comparison of the distribution of range in samples of 4 calculated 
from Pearson’s constants with the actual distribution in samples of 4 which 
occurred in the ordinary course of business when an important series of analyses 
was being made: the item was that which I have indicated as (1) in fhe tables 
of this paper. 

It will be seen that while the general shape of the curve gives a fairly good fit 
(P = 0*13 for 5 groups) there is excess at the tail end, showing the leptokurtic 
nature of the distribution and the advantage of repetition. 

To recapitulate, routine analyses are subject to errors of which an estimate can 
be made either by a special analysis of a comparatively large number of samples 
of the same material, or by considering the differences between pairs which occur 
in the ordinary course of business. Owing to the fact that there is Usually a 
secular variation in the error these will not in general give the same result and 
care must be exercised in the use of the standard deviation obtained. From such 


IO-2 





148 


Errors of Routine Analysis 


determinations of error combined with certain factors obtained from Dr Pearson’s 
paper on the range of small samples, we have derived limits at which repetitions 



Fig. 4. Frequency Curve showing Expected and Histogram showing Actual Number of 
Ranges of given size in samples of 4 for 100 trials. 

Equation to the curve is 

/ r \ 4-499 / r \ 19-395 

*/ = 45-694( 1 +^Tai ) l 1 -^) • 



20 VI y = 4/ 0 (x-41-567) 31 - 410 a;- 373 - 818 [log y 0 = 603-835 959 3]. 

60 VI y = y 0 {x- 11-673)S«-sm ar-m-aa [log= 187-208 715 5], 
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should be made and beyond which outlying observations should be rejected. 
A rule is given for the application of this procedure, but it should always be 
remembered that such rules are to be regarded as aids to and not as substitutes 
for common sense. 

I should like to thank the authorities in charge of the laboratory who have 
allowed me to use their figures and several friends who have helped in the prepara¬ 
tion of the paper, particularly “Mathetes ”, who has computed the equations and 
drawn the figures for me. I am further indebted to Dr Hinchcliff of the Free 
State Department of Agriculture, who supplied me with information about the 
sugar beet. 
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YIELD TRIALS 

[Bailliere’s Encyclopaedia of Scientific Agriculture (1931), p. 1342] 

It is quite easy to produce new hybrids; from a single cross fertilization between 
two varieties of a cereal one can select thousands of strains, differing more or 
less in some character from all the others. Whether any particular strain is worth 
preserving will depend upon many things, but among them one is indispensable: 
the yield must be sufficiently high to make the crop profitable. 

Similarly new manures or combinations of manure are continually being 
proposed, and the one condition of their use is that the increase in yield which 
they provoke must be such as to pay for the cost of applying them. 

As improvements are nowadays not likely to be very great, it becomes necessary 
to estimate comparative yields very closely, and this is not as simple a matter 
as it may appear at first sight. 

In the case of selection of strains of high yield the difficulties are of two kinds: 

(1) That similar environmental conditions of weather, soil, etc., may evoke 
different responses in strains, even though nearly related, of the same race; one 
may be better suited than another by a light soil, or a dry summer, and so on. 

(2) Quite apart from characteristics of this kind, the soil on which the plants are 
grown is never uniform, so that differences in yield arise which have nothing 
to do with the strains which are being tested. Similar considerations apply to 
manurial and other trials. 

Clearly, difficulties of the first kind can only be surmounted by repeated trials 
in many seasons and in all relevant types of soil and situation; but until we have 
arrived at some method of estimating what error is introduced into our con¬ 
clusions by difficulties of the second kind and of reducing this error to manageable 
dimensions, we are not in a position to say whether observed differences in yield 
are due to the different strains, to their differential response to their environment, 
or merely to chance variation in the soil on which they have been grown. 

In what follows it is proposed first of all to give a brief account of some of 
the methods of arranging yield trials which have been introduced during the 
past twenty-five years, then to indicate the general reasoning which enables us 
to estimate the degree of reliance which we can place on our results, and, finally, 
to work out two examples of the actual calculation of such estimates. 

Although it must have been recognized long ago that experiments to determine 
comparative yields were not quite straightforward, the science of planning such 
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experiments was not investigated up to twenty-five years ago, and the practice 
of the art is only now becoming general. 

It is true that sound results had been obtained by long continued trials carried 
out over a wide area with comparatively large plots, notably by the Danish 
Royal Agricultural Society and by the Irish Department of Agriculture (Slut- 
ningsberetning om Maltbyg- og Hvedeudvalgets Virksomhed Vedrorende JByg- og 
Hvedeavlen , Chr. Sonne, Foredrag i Det Kgl. Danske Landhusholdnings-Selskab 
den, 1 April 1903; H. Hunter, The Barley Crop, Ernest Benn, London), but, on 
the other hand, there was a tremendous amount of energy wasted on experiments 
from which, as we now know, it was impossible to have reached reliable con¬ 
clusions. 

To obtain decisive results in this way it is not only necessary to work on a 
very large scale (in the Irish work Archer and Goldthorpe barleys were compared 
fifty-one times over a period of six years), but the differences to be determined 
have to be comparatively large, for a single one-acre plot must exceed another 
by at least 25 % if it is to be considered significantly better. 

It was not, however, until 1910-11, with the publication of papers by Stratton 
and Wood, Mercer and Hall, and Montgomery (T. B. Wood and F. J. M, Stratton, 
“The interpretation of experimental results”, J. Agric . Sci. vol. m, No. 4; 
W. Mercer and A. D, Hall, “Experimental error of field trials”, J. Agric. Sci, 
vol. iv, No. 2; E. G. Montgomery, “Variation in yield and methods of arranging 
plots to secure comparative results”, Nebr. Agric. Expt . Stat. 2 5th Ann. Report, 
and “Experiments in wheat breeding ”, U.S. Dept. Agric. Bur. Plant Indust. 
Bui. 269), that the real difficulties of the problem became fully apparent. Each 
of these papers dealt with the yields on the component parts of an area of land. 
Stratton and Wood dealt with x 9 q acre of mangolds in plots of T oVo acre 5 Mercer 
and Hall with 1 acre of wheat in ^-acre plots, and also with 1 acre of mangolds 
in _^__ acre plots; Montgomery, for two years in succession, with wheat grown 
on the same ^ acre and harvested in x^o-acre plots. 

In each case the area was chosen as being particularly uniform in appearance, 
in each case the yields showed unexpected variability. Further, this variability 
was not random (see section on Randomness), nor, on the other hand, was it, 
except in the very slightest degree, regular. There was, it is true, a general 
tendency for plots at one end or side of the area to give higher yields than those 
at the other, but the “contours of fertility”, though they existed, showed no 
exact parallelism (unpublished work by R. A. Fisher). 

This suggested at once, firstly, that great accuracy would be obtained if plots 
whose yields are to be compared were sited closely together so that chance 
variations in the soil fertility should be shared as equally as possible; and, 
secondly, that to obtain this close siting the plots must be kept as small as it is 
convenient to work with, especially if many variants are being tested. 
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But, besides convenience in working, there is another limit to the smallness 
of experimental plots. 

This is due to the fact that the outside of a plot does not represent a field crop, 
since it is in contact with plants of some other variety, or subjected to another 
method of treatment, and experience has shown (T. A. Kiesselbach, “Plot 
competition as a source of error in crop tests”, J. Amer. Soc. Agron. vol. xi; 
E. S. Beaven, “Pedigree seed corn”, J.R.A.S.E , vol. lxx, 1909) that plants 
growing alongside one another are in strong competition for both food and light. 
Nor can this difficulty be overcome by leaving unoccupied space between the 
plots, for even in this case the outside plants are only representative of the 
outside of a field where the plants are able to get excessive nourishment, and, 
of course, the outside forms a small part of a large field, but a very sensible 
proportion of a small plot. It is, therefore, necessary either to have the plots 
so large that the “border effect” is negligible, or to discard the outside rows and 
plants from the portion which is to be weighed. 

The first system of yield trials based on a realization of the foregoing facts was 
Dr Beaven’s “chessboard” system of square yard plots (E. S. Beaven, ibid.; 
“Student”, “On testing varieties of cereals”, BiometriJca, xv, pp. 271-93 [ 11 ]), 
which was, in fact, in use at Warminster in 1909 before the publication of the 
three papers which have been cited. 

This system, which has become the standard method of comparing the yields 
of varieties of cereals under wire cages, derives its name from the fact that the 
plots are square. Each square measures 4 ft. along the side, and in it are sown 
eight rows of seeds at 6 in. between the rows, the seeds being planted 2 in. apart 
in the rows. At harvest, however, the two outside rows are rejected, and also 
plants in the 6 in. at both ends of the other rows. Thus interference with neigh¬ 
bouring varieties is reduced to a very small amount. 

The arrangement of the plots in the experimental area merits attention, and 
should fulfil the following conditions: 

(1) The mean position of each strain tested should be the same, to counteract 
the effects of a possible “ fertility slope”. 

(2) Different plots of the same strain should be spaced so as not to be needlessly 
close to one another; each strain then shares as far as possible in the casual 
vicissitudes of the experimental area. 

Beaven’s own arrangement was as follows: 
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the panel of forty plots being repeated as often as is considered necessary, 
generally, in his case, four times. 

It will be seen that this evens up a fertility slope across the panel, but that the 
earlier letters are more to the left than the later by a small amount —A averages 
y-Q of a square to the left of the centre line, H the same amount to the right of 
it—and to correct for this, the present writer has suggested that alternate panels 
should be reversed, H being written for A, and so forth. This contravenes con¬ 
dition (2) at the places where the panels join, but not to a very serious extent. 
Even so, this arrangement has been criticized on the ground that a regular pattern 
makes it impossible to assume that the error of estimation is random. 

Be this as it may, such an arrangement has two solid advantages for this 
particular purpose over partly random or controlled random arrangements, such 
as Fisher’s randomized blocks or Latin squares which are described below. 
These advantages are: (1) that the chances of mistake are lessened by a regular 
system, and such mistakes have been known to occur even with the most 
experienced workers; (2) that the use of such plots for observation purposes is 
very much facilitated by the ease with which a particular strain may be picked out. 

The chessboard arrangement has been of great practical service in testing 
barley hybrids, and, in fact, the two varieties of barley most popular at the 
present time in the British Isles, Plumage-Archer and Spratt-Areher (see Barley), 
were both tried out in wire cages in this way, and found to be superior to their 
competitors before proceeding to trial on a larger scale. 

On the other hand, selections which have proved successful in the cage have 
not always succeeded in the field, though this is probably due to the fact that 
there is a difference between horticultural methods such as are there used and 
the ordinary procedure of agriculture. It may also be due to the wire covering, 
and experiments which Mr M. Caffrey is conducting at the Royal Albert Agri¬ 
cultural College at Glasnevin, Dublin, in the open, may throw light on this point. 

Before considering large-scale work, it is necessary to refer to “rod rows”, 
i.e. plots consisting of a single row of plants one rod in length. These have been 
used largely in American work (H. K. Hayes and A. C. Arny, “Experiments in 
field technique in rod row tests”, J. Agric. Res. xi, p. 399), but in their original 
form, though, of course, very convenient for purposes of observation, they are 
nearly useless for the determination of yield even when replicated many times. 
This follows from the fact noticed above, that the yield is due not only to the 
inherent quality of the seed, but also to the vigour or lack of vigour of its 
neighbours. 

In a modified form, where three or more rows of the same variety are grown 
consecutively, and the outer rows rejected when determining yields, but retained 
if necessary for use as seed, the rod row system can give useful results if sufficient 
replications are made. Even so the area wasted by rejected border amounts to 



154 Yield Trials 

a large proportion of the whole (67 % with three consecutive rows, 50 % with 
four, as compared with 44 % in the chessboard), so that the method is not 
recommended, except as a rough test at the stage where a large number of strains 
or hybrids is to be cut down by wholesale discards, while at the same time as 
much seed as possible is wanted for those selected for further trial (F. W. Hilgen- 
dorf, “Plant breeding methods results”, N.Z. J. Agric. March 1928). 


Large-Scale Work 

Beaven’s Half-Drill Strip Method . We now come to methods of carrying out 
experiments on an agricultural scale, and here, again, Beaven has introduced 
a method which takes full advantage of the light thrown on the problem by the 
papers cited above (E. S. Beaven, “Trials of new varieties of cereals”, J . Minist. 
Agric . vol. xxix, Nos. 4 and 5 (1922); “Student”, loc. cit.). 



Fig. 1 


In order to compare yields grown on areas as contiguous as possible, he took 
an ordinary seed drill of which he put the middle coulter out of action and 
divided the seed box into two halves. Seed of different strains having been put 
in the two halves, the drill is driven down the field, and wheeling at each end, it 
sows the strains as in Fig. 1. At harvest the outside drills of each half-drill strip— 
those in contact with the other strain—were pulled up by hand and discarded 
to avoid the “border effect”, and each half-drill strip cut separately. If the two 
strains ripened simultaneously they were cut by a reaping machine, but, if not, 
one had to be cut by hand. 

Originally a machine was used which delivered a separate sheaf off each 
5 ^o acre, but this procedure was criticized on the ground of lack of randomness, 
and was afterwards found not to be necessary, the gain in the apparent accuracy 
over the plan of weighing only the totals of each half-drill strip not being found 
worth the additional trouble. 

The weight of each half-drill strip is then compared with its neighbour of the 
other strain, and the layout of long narrow plots placed closely together ensures 
that the error of the comparison shall be small. 
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JF° °° m P ensate fOT the probable fertility slope the series should begin and 
n ^ the same Stram ’ 811(1 the followin g precautions be taken: 

of Wei g ^ 0un ^ chos f n for the experiment must be free from periodic changes 
of level, such as those left by having been laid down to grass in “lands”, orjf 
present, the seed must be drilled across these “lands”. 

(2) The drills should run across those of the previous cultivation. 

, . j e *P enmentaI area should be surrounded by at least one drill of the 

same kind of crop as is being experimented with. 

(4) Great care must be taken when sowing to drive quite straight, so that the 

, W ° ftfae machme may run as nearly as possible in the same track 
direction^ 6 “ outside ” ^ « the last journey in the other 

(5) After harvest the drilling must be checked by measuring the distances 

cst j^ 'l r ° WS COrn ’ WWoh be a PPWciably different in the 

case of the two varieties owing to the horses pulling unequally, and this will 

favour that variety which has the wider gap. As, however, the gain will be 
approximate y proportional to the gain in area, allowance can be made for this, 
lo avoid this difficulty Beaven now uses a special drill as follows (W. H 

(1925))’ ReP ° rt ° n tnalS ° f f ° Ur barleyS J ' Nat ^ AgriC - B0L N °- 14 


B 


V3" 6" 9 ' 

Wheel. 


6 " 6 * 


9* 6" 3"V 

Wheel 


As before, the seed box is divided so as to take two strains, but the coulters 
are spaced so that between each four rows, 6 in. apart, there is a wider gap of 
9 in., enabling each four rows to be cut separately. ^ 

Half of the sets of four consist of two rows of each strain, and these are dis- 

L4 tit * he ? th6rS are thus arran g ed “ same manner as before 
(ABB A A...BA), and not only have the advantage that they are sown on 

equal areas whether the drill be driven straight or no, but they are also flanked 

bytwodiscardedrowsoftheir own kind,thus reducing interference toaminimum 

1 .I 111 be S . een tbat t tlle half ' ch ' 111 stri P °an only compare two strains at a time, 
i several are to be tested it is necessary to compare each with a “control” 
This is a serious limitation; nevertheless, the shape of the plots enables the 
comparison to be made with great accuracy. 

no^ndom” 86111611 * ° f Pl ° tS alS ° haS be6n 0riticized on the g round that it is 

The application of the principle of long and very narrow strips thus introduced 
by Beaven for cereals can, of course, be applied to root crops, but for manurial 
experiments there would be danger of the benefit of the manure straying to the 
neighbouring plot. For these, the plots must be wider, and the method of their 
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arrangement has been made a special study by Dr Fisher, at Rothamsted. As 
a result he has evolved (1) “Randomized blocks” and (2) the Latin squar 
(R. A. Fisher, Statistical Methods for Research Workers, Ohver and Boyd, 

E tn'th?former he divides up the experimental area into blocks which will 
contain one of each of the variants to be tested (varieties, manorial treatment 
or whatever it may be). Within the block the arrangement is random, determined 
by some such method as dice throwing. The advantage is that since all the area 
of any one block is likely to be more uniform than the whole area, the tnalwhhm 
that block wiU be affected to a less extent by variations m soil fertility than _ 
the plots were scattered about over the whole area, and yet the arrangement is 
random. The disadvantage is that in practice it may happen that the particular 
random arrangement adopted may result in one strain (or toeatment) havmg 
a more favourable mean position than another: m a majority of the rando 
blocks it may be on the north, and the north end more fertile than the SO ut 
Such a possibility is allowed for in the subsequent calculation, but the general 
effect is to introduce an unnecessary increase of uncertainty into the result: the 

error is larger than need be. ... r 

To meet this Fisher evolved the Latin square, where the mean position of 
each variety is situated in the same place by making it occur m each row and m 
each column of a “ square ”, repeating the “squares ” as often as may be required. 
Thus four strains (A, B, G and D) might be arranged m a square thus. 



Fig. 2 

the actual position being obtained by dice throwing with increasing limitation 
as, for example, after the top row has been fixed no further plot m the firs 
column can be A, and so on. 

This most ingenious arrangement is ideal from the point of view o m erpr 
tion but care should be taken not to have the plots so large that they do not le 
closely together, or the error of the results, accurately though it is estimated, 

may be so large as to make the experiments inconclusive. 

The Latin square need not, of course, be square; it will be of the same shape 
as any of its constituent plots. Thus, in the case of potatoes, where there is some 
evidence (R. N. Salaman, “ The determination of the best method for estimating 
potato yields”, etc., J. Agric. Sci. xm (1923), p. 361) that the border effect is 
negligible, the width of each “plot” might be a single drill, and the length (for 
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four strains) one-quarter the length of the field; a number of such “squares” 
side by side would doubtless give results subject to a very small error. 

At first sight it might seem that cereals might be tested by a combination of 
Beaven’s half-drill strip with the Latin square, but the practical difficulty of the 
time taken to clear out the seed boxes after every strip, or alternatively of driving 
the drill straight enough to be able to drill in one variety at a time, and then fill 
m with the others afterwards, seems to be insuperable, unless a drill with spare 
boxes and arrangements for changing them quickly could be devised for the 
purpose. In any case, room for turning the horses would have to be left between 
the ends of the plots. 

But besides the technical difficulties, it may be impossible to use the Latin 
square for lack of room, since the repetitions must equal the number of varieties 
or treatments to be tested. In such cases it is often possible to use equalized 
random blocks,* which enable the principle of the Latin square to be used without 
the very large number of repetitions. For example, the writer was able to suggest 
an arrangement to a horticultural experimenter who wished to compare ten 
treatments with five plots of each. He was anxious to use the Latin square, but 
realized that if he used two Latin squares the two sets of five treatments would 
not be properly comparable. 

The proposed arrangement included five randomized blocks, but, whereas the 
first was completely random, each further successive block had its randomness 
more and more controlled, just as each successive row in a Latin square. 

It will be seen that each column can equally be considered a “block and that 
with one small exception it is as “equalized” as a Latin square: a fertility slope, 
therefore, either in the direction of the rows or of the columns, does not introduce 
errors, and the error of a comparison will be correspondingly reduced. The 
exception is that owing to there being an odd number of blocks, A,D,E,F and J 

| Block I 
| Block II 
I Block III 
J Block IV 
l Block V 


* I have seen no account of work planned in this way, but it is an obvious application of 
Fisher’s methods. 
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occur in the top row of their block three times and in the lower row twice, and 
vice versa with the others. 

To sum up, the following methods of testing yields have been described: 

On the small scale: 

(a) Beaven’s chessboard. 

(b) Rod rows. 

On the large scale: 

(c) Beaven’s half-drill strip. 

Applicable to either large or small scale: 

(d) Fisher’s randomized blocks. 

(c) Fisher’s Latin square and its modification, equalized random blocks. 

In the foregoing all reference to two methods—namely, the use of control 
plots, and the estimation of yield from samples taken from the plots instead of 
by harvesting the whole plots—has been omitted. 

The former method was never very satisfactory (R. Summerby, “Accuracy 
in field experiments”, J. Amer. Soc. Agron. vol. xvii, No. 3), and it was quite 
usual for “corrections” based on the “control” plots to increase the error of 
comparisons: it has now been superseded by the methods outlined above. 

The latter method, on the other hand, has not yet been fully worked out, 
though it appears likely that in some cases it will become the ordinary way of 
estimating yield (F. L. Engledow, “A census of an acre of corn”, J. Agric. Sci. 
xvi (1926), p. 191; A. R. Clapham, “The estimation of yield in cereal crops by 
sampling methods”, J. Agric. Sci. xix, p. 214; J. Wishart and A. R. Clapham, 
“A study in sampling technique: the effect of artificial fertilisers on the yield of 
potatoes”, J. Agric. Sci. xix, p. 600). Care should, of course, be taken to discount 
any sources of error, such as loss of corn from shattered ears, which may take 
place in one variety on the large scale, but not in samples cut by hand. 

Statistical Interpretation of Results 

For an adequate exposition of the methods of statistical analysis the reader is 
referred to treatises on the subject (R. A. Fisher, loc. dt.), but an indication of 
how it comes about that we must invoke the aid of mathematics to make the 
most of our experiments may not be amiss. 

To take a very simple case: Suppose A and B are compared in 1927 and 1928, 
also G and D, and the following results obtained: 


Year 

Yield per acre in cwt. 


A 

B 

a 

D 

1927 

20 

23 

20 

29 

1928 

19 

24 

21 

20 
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In each case there is an average difference of 4 cwt. between the pairs, yet most 
people would probably conclude instinctively that more reliance could be placed 
on the comparatively concordant differences between A and B than on the 
discordant results for C and D. We therefore tend to give weight to concordance. 

A B 

Again, suppose a further experiment in 1929 gave — —, we should probably 

ZZ Zo 

feel satisfied that our conclusion that B is better yielding than A is strengthened; 
we therefore put more reliance on increased repetition of experiments. But suppose 

G D 

that instead we had repeated the G D comparison and obtained-, would the 

18 22 

three CD comparisons which include the discordant 1928 result be better than 
the two A B comparisons, concordant though they were? 

Clearly we cannot answer questions of this kind by unaided common sense, 
but fortunately mathematicians have dealt with the evidential value of events 
of this nature, and we can use the methods and tables of the theory of chance, 
provided always we make certain that the fundamental condition of applying 
the theory—namely, that the events with which we deal are random —is adequately 
satisfied. 

Randomness. In view of this proviso it is important to have a clear idea of 
what constitutes randomness, and this is by no means easy. 

In our particular case a series of yields would be random if the value of each 
of them in relation to those of the others were quite independent of its position 
in time and place relative to them. 

As mentioned above, this does not happen in practice; yields of plots situated 
close to one another are more alike than those far apart, and in particular there 
is a general tendency for yields to increase or decrease as we go from one end or 
side of the experimental area to the other. It is therefore necessary to arrange 
the positions in such a way as to superpose randomness upon the biased fertility 
of the soil. This may be done in two ways: either the positions may be assigned 
by one of the recognized methods of invoking chance—dice throwing, coin 
tossing, card drawing, and the like—or a regular pattern may be devised which 
will equalize the more probable variations in fertility, but which will yet be 
sufficiently complicated for it to be a matter of chance how the residual variations 
may affect any particular comparison. 

Thus in Fisher’s randomized blocks each treatment is repeated once per block, 
ensuring that each shall be equally affected by such variation as is common to 
the block, but the position within the block is determined by chance; similarly, 
the variation in fertility common to the plots which make up any row Or column 
of a Latin square is equally shared, but the positions in the row or column are 
determined by chance. 

Beaven s chessboard system, on the other hand, depends on a regular arrange- 
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ment, but one which is sufficiently complicated for the claim to be made that, 
by imposing it upon the ordinary variation in the soil, we get in fact a randomness 
in the residual variation from the mean of each small group of consecutive plots, 
which enables us to take advantage of mathematical analysis. Nevertheless, we 
must be careful to avoid arrangements in which the mean position of the different 
strains is not the same for all, and also such an order as: 

A E D C B 

B A E D C 

G B A E D 

D 0 B A E 

E D C B * A 

where a possible crest or trough of fertility parallel to the diagonal of M 5 s might 
improve or depress the yield of one variety, without any warning being given 
by the calculation of a large error of observation from the observations. 

Beaven’s half-drill strip is, essentially, in a rather different position. Its pattern 
is of the simplest, it can only vary between A B B ... B B A and A BB ... AA B, 
of which the former must be chosen in case there may be, as is probable, a 
“ fertility slope” across the drills. The extreme length and narrowness, however, 
ensure that the difference between adjacent half-drill strips is otherwise random 
on ordinary soils, but it is necessary to avoid possible periodic variations in 
fertility parallel to the drills. Thus, if the field had been laid up in ‘ lands , and 
the drills were of such a width that the bottom of the land was always occupied 
by the same strain, a non-random system of error would be included which would 
vitiate the result. Similarly, a periodic variation in fertility might be left by 
previous cultivation, and to avoid this, Dr Hilgendorf of the Canterbury Agri¬ 
cultural College, New Zealand, drills diagonally across previous cultivation. 

Analysis of Variance. With suitable precautions, then, all these arrangements 
give results which can be treated by the methods of the Theory of Chance, and, 
as it happens, the particular method (“The Analysis of Variance”) introduced 
by Fisher, primarily for this purpose (R. A. Fisher, loc. cit .), can be applied to 
all of them, and I have limited my discussion of the Theory of Chance to such 
considerations as seem to me to be necessary to an understanding of this method. 

If, then, an experiment is repeated several times, we get as a rule as many 
different results, though by chance some may be identical. If these results are 
random, we may attach more weight to their mean value the more numerous 
and the more concordant they are. 

The measure of the weight to be given to a Mean is the Standard Deviation 
(s.D. or cr), which is derived as a rule from the results themselves by the following 
procedure: 
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Taking the difference between each result and the Mean, it is squared, and the 
sum of these squared differences is divided by the number of experiments less one. 
The quotient is called the “Variance” of the results, and the square root of the 
Variance is the Standard Deviation. 

The Variance of the Mean is obtained by dividing the Variance of the results 
by their number, and, as before, the Standard Deviation of the Mean is the 
square root of its Variance. 

In algebraic notation, if be n experimental results, and x their 

mean, then 


the Variance 


S(x — x) 2 
" n- f~ 5 


the Standard Deviation, <r, 
the Variance of the Mean 



S(x~ x) 2 \ 
n-l )’ 


S(x — x) 2 
n(n — 1) ’ 


and the Standard Deviation of the Mean 


II S(x — #) 2 \ 

~ *J\(n(n-l)‘ 

Having obtained the s.d., we can, by referring to tables constructed for this 
purpose, find the chance that the mean of an infinite number of repetitions under 
the same conditions would differ by less than any given amount from the mean 
of the few experiments which we have made. Thus a difference as large as, or 
larger than, once the s.d. of the mean of a large number of results occurs 16 times 
in 100 such series of experiments; as large as, or larger than, twice the s.d. about 
2-3 in 100; as large as, or larger than, three times the s.d. only about 0*13 times 
in 100, and thereafter its rarity increases very rapidly. 

We can thus judge of the value of our evidence, and as in other matters, the 
degree of accuracy which we demand will depend on the importance of the action 
fco be taken relative to the cost of repeating the experiments. 

For many purposes a probability of twenty to one is considered sufficient 
to justify drawing a conclusion, and a result which leads to such a probability 
is often conventionally called “significant”. This corresponds to a quantity 
1-65 times the s.d., when the s.d. is known accurately. 

Now it can be shown that, provided randomness has been observed, variances 
ire additive. If one set of causes, say the innate differences in fertility between 
the strains which are being tested (variance IQ, act simultaneously with another 
set of causes, say random errors of the plots which are being tested (variance IQ, 
then if V t be the total variance of the yields, 

v< = v,+%. 


BPS 


II 
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Similarly, if we can arrange the plots as in randomized blocks, or the sets of 
Beaven’s chessboard, so that part of the variation is common to the blocks or 
sets (variance V b ) and part random ( V e as before), then 


Or in the Latin square: 


v t = v s +v b +v e . 


V- F + F +F , +F 

r t r s ' r rows ' r columns ' r e' 


In any of these cases it is the differences between the strains which make up 
V s about which we have to form a judgment, and it is cr e — *jV e with which we have 
to measure the certainty. 

Now V t is calculable, it is the total variance of the yields, and V 8 , V b , etc., are 
the variances of the means of the strains, of the means of the blocks, etc., so that 
we can find V e by difference. 

Degrees of Freedom. But before giving an example of the determination of V e 
by this method, it is necessary to introduce the reader to one other technicality. 

It will have been noticed that in calculating the variance the sum of the 
squares was divided not by n the number of results, but by one less than that 
number. The reason for this is that since we do not know what would be the 
mean value of an infinite number of results obtained under like conditions, we 
are driven to use the mean of the n results which we have, and it can be shown 
that this would necessarily give too low a result were we to divide by n. We are 
on the average right if we diminish that number by one. 

Now for any given mean it is only possible to vary n— 1 of the results, the 
last one is fixed by the mean and the other n — 1; hence, there are said to be 
n—\ degrees of freedom. If in addition to the mean of the whole we also calculate 
from the mean of a group, say the yield of the plots of a given strain, the number 
of results which can vary is again diminished by one, there are now but n — 2 
degrees of freedom, and similarly for each such mean. But it should be noticed 
that if the general mean and the means of all the strains but one are calculated, 
the remaining one is now fixed, i.e. the means of the strains, too, have a degree 
less freedom than the number of the strains. 

In this way the original n — 1 degrees of freedom may be allotted to the different 
variances with which we are dealing, each variance accounting for one less than 
the number of categories from which it is calculated, and the balance is left foi 
the calculation of the random variation. Thus, in a Latin square in which five 
strains are tested in twenty-five plots there are 


5 strains taking up 4 degrees of freedom. 
5 rows „ 4 „ ,, 

5 columns ,, 4 ,, ,, 


So that of the original 24 degrees of freedom only 12 are left for the calculator 
of the variance due to random error. 
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We are now m a position to calculate a very simple numerical example. 
Suppose we had arranged plots of four strains (A, B, C and D) in a Latin 
square as follows: 

A V> s~i 


O D 


D G 


and that the following yields had been obtained: 


Can we say that the yield of B under the conditions of the experiment is sig¬ 
nificantly better than that of D? & 

For simplicity the above yields have been chosen so that the general mean is 
a whole number, 4, and in working we may rewrite the yields as differences from 
that number, thus: 


Sums of 
rows 


Average 


Average 


Also: ; . 

ti + n 9~? + a + £ A=z +1 “2 + 1 -2 = -2 - i. i 

: ? ■ B = +4+1 - i+i=+5 +1 { i 

4-1 2 i n 1 0 + 1 +2 + 0 = +3 +| 

+1 " 2 ~ 3 0 _~_ 4 “I D=- 2 + 0 - 1 - 3=-6 -l| 

Sums of ~ --—— 

Columns + 2 + 4 - 6 0 0 

Average+| + 1-$■ 0 

The total variance then is the sum of the squares divided by the 15 degrees of 
freedom; or fifteen times 

TJ — (l + 16 + 0 + 4-}-l + 0 + 4+ l + l-{-4+l-fl + l4~4-!-9_j_o) __ 

2g The contribution made to this by the variance of the strains is for each A, 
jg f° r each B, and so forth, or 


.(1 25 9 91 

4 (4 + 16 + l6 + 4j _ 18? ’ 


It will be seen that this result can be arrived at more easily by taking 1 of the 
squares of the sums, i.e. ,, 4 

1(4 + 25 + 9 + 36) = 18J. 

similarly, the contribution made by the variance of the rows is 

1(9 + 0+1 + 16) = 61 


11-2 
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And the contribution made by the variance of the columns is 

-- 1(4+16 + 36 + 0) = 14. 

These facts are then set out in a table as follows: 


Variance due to 

Degrees of 
freedom 

Sums of 
squares 

Variance 

Standard 

deviation 

Strains 

3 

m 

6*17 

— 

Rows 

3 

6* 

2*17 


Columns 

Random error 

3 

6 

14 

9 

4-67 

1-5 

1*225 

Total 

15 

48 

3*2 

— 


the degrees of freedom and the sum of the squares due to random error being 
obtained by differences between the total and the sum of the other three; the 
variance is obtained by dividing the third column by the second, and the standard 
deviation by taking the square root of the variance. 

Now the average difference between B and D is 2§, and the random variance 
of each mean is 1-5/4, but since we are dealing with the difference between the 
two, the variance of this difference is twice this or 0-75, and the standard deviation 
is a/ 0-75, or 0-866. The difference between B and D is, therefore, 2-75/0-866 = 3-17 
times the standard deviation, and is to be looked out opposite t = 3-1/3-2 m 
the column headed by 6 degrees of freedom. 

With standard deviations calculated from so few degrees of freedom,“ Student s 
tables must be used: these tables are given in full in Metron, vol. hi (1925) 
[12]; an abstract is given in Fisher’s Statistical Methods for Research Workers, 
1st ed. p. 137, and they are also given in a somewhat less convenient form in 
Biometrika, xi, p. 416 [8], and Tables for Statisticians and Biometricians , 2nd ed. 
p. 63. If using the last two, the 3-17 must be divided by the square root of one 
more than the degrees of freedom, and looked out under the column headed by 

the same number, i.e. we look out 3-17/^/7. 

In any case we find the probability of obtaining a smaller difference by chance 
to be about 0-99—i.e. it is 99 to 1 against getting such a large one—and we maj 
therefore suppose that the difference between B and D would again come out 
in favour of B, if B and D were grown again under similar conditions. 

On the other hand, the difference between G and A is only 1-25/0-866 times 
its standard deviation, say 1-45, and on looking this up we find the probability 
of obtaining a smaller result by chance is only 0-9, i.e. the odds are only 9 to 1 
against getting such a large difference, and we cannot conclude that the difference 
between C and A is due to the strains and not to the positions of the plots ii 
which they are grown. 

It should be noticed, however, that the formula which we have used in com 
paring B and D is that which it is correct to use when there are but two mean 
to compare: it would be right to use it, for example, if we have a number of trial 
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of B and D in different places or seasons, and we wish to examine the whole series 
of comparisons of B and D. 

If, however, there is no particular reason why we should compare these two 
rather than any other pair, it is clear that the chance of obtaining a large difference 
between some pair or other is greater the larger the number of possible pairs. 
To meet this Fisher suggests that the mean of each strain should be compared 
with the general mean, and that the strains should be divided in this way into 

(а) those significantly greater than the mean; 

(б) those not differing significantly from the mean; and 
(c) those significantly less than the mean. 

The appropriate standard deviation to use, if o* be the standard deviation of 
the random error of a single plot, and there are n repetitions and m strains, is 

cr*J(m— 1) 

<J(n m) 

In this case n = m = 4, and the standard deviation is therefore T225 x — = 0-53. 

4 

Referring to the table, we find that the 20:1 limit corresponds to twice the 
standard deviation (t = 2*0) for 6 degrees of freedom, and accordingly we may 
divide the four strains as follows: 

B, significantly better than the mean; 

A and 0, not different significantly from the mean; and 
D, significantly worse than the mean. 

The above example, though “made up”, illustrates what commonly happens 
in practice—namely, that the variance of the rows and of the columns which 
we have neutralized in the arrangement of the Latin square is in each case 
greater than the Random Error, and we have therefore increased the precision 
of our experiment by this arrangement. 

It is not usual, however, to have an exact figure as the mean, and the following 
example of a half-drill strip experiment which was actually carried out at 
Ballinacurra, Co. Cork, in 1929, gives the procedure when measuring not from 
the mean, but, to avoid working with fractions, from some arbitrarily chosen 
origin. This trial compared a selection from Dr Hunter’s Spratt-Archer barley 
with his selected Archer in twenty-two half-drill strips each. It will be noticed 
that when Spratt-Archer was on the north , there was practically no difference, 
but that it was markedly better when on the south; thus there was a fertility 
slope across the strips, and the two sets should be averaged separately at the loss 
of a degree of freedom. 

Naturally with only two varieties to compare we do not concern ourselves 
with anything but the differences between corresponding strips, and to avoid 
fractions, these differences are measured in J-lb. units. The mean is here fractional, 
and to save arithmetic, an arbitrary point is chosen as origin. Now any arbitrary 
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origin may be chosen, but since the largest difference is + 40 and the smallest 
— 24, the obvious origin is zero. [A useful exercise for the beginner would be to 
take another origin (say +10), measure each difference from this, and work out 
the example again; the same result should be obtained, and to facilitate an 
exercise of this kind the figures which should be identical in any such comparison 
are given in italics. 

Thus, if +10 were chosen the differences would run: 


+ 4 + 25 — 34 — 6—16, and so on.] 
Table of Yields 



Yields 

Difference, SA.-A. 

Spratt- 

Archer 

37, No. 3 

Archer 

Spratt- 
Archer 
on North 

Spratt- 
Archer 
on South 


lb. 

lb. 

i lb. 

* lb. 


36* 

33 

+ 14 



41* 

32* 


+ 35 


34 

40 

-24 



40 

39 


+ 4 


37 

38* 

- 6 



37* 

36* 


+ 5 


38* 

35 

+ 14 



42i 

38* 


+ 15 


41 

42* 

- 6 



43 

40 


+ 12 


42* 

42 

+ 1 



42 

38* 


+ 14 


41 

39* 

+ 7 



45 

38* 


+ 26 


44* 

42 

+ 9 



42* 

40* 


+ 7 


40 

39* 

+ 2 



41 

39* 


+ 6 


41 

39 

+ 8 



39* 

41* 


- 7 


36 

39* 

-14 



43 ! 

33 


+ 40 

Sum 

Average 

QO o 

OO T* 

00 

848 

38-5 

+ 5 +157 

+ 162 
+ 7*364 


I owe these figures to the courtesy of the Irish Free State Department of Agriculture. 


We have first the sum of the squares of all differences: 

196 +1225+576 +16 +36 +25 +196+225+36 +144 +1 +196 

+ 49+676 + 81+49+4 + 36+64 + 49 + 196 + 1600 ... = 5676 

To correct for the arbitrary origin we subtract 22 times the 
square of the mean distance from the origin, i.e. 22 x 7*364 2 = 1193 

4483 

The sum of the squares due to the North/South fertility slope is: 

T Y(5 2 + 157 2 ) . = 2243 

Subtracting the same correction for the arbitrary mean ... = 1193 

We get 


1050 
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We now arrange as before: 


Variance due to 

Degrees of 
freedom 

Sums of 
squares 

Variance 

_L-i-___ 

Standard 

deviation 

North/South fertility slope 

1 

1050 

1050 



Random error 

20 

3433 

171-65 


13-1 

Total 

21 

4483 

213-5 

— 


It will be observed that all the figures in this table are the same no matter 
what arbitrary origin is chosen, and this method is the same no matter how 
many different sources of variance are accounted for; the same correction (the 
square of the mean multiplied by the total number) is subtracted from the sum 
of squares due to each source and from the total. 

The mean difference is then 7-364 in favour of Spratt-Archer 37, No. 3, and 

171*65 

the variance of the comparison is , giving a standard deviation of 2-79 


(about 1-7 %), and we have 


7-364 

2-79 


2-64. 


Looking this out in the table under 20 degrees of freedom, we find T = 0-9921; 
i.e. we should get a result in favour of Spratt-Archer 37, No. 3, as large as this 
by chance if there were really no difference only 79 times in 10,000 trials, and 
we may conclude that under the conditions of the trial Spratt-Archer is definitely 
the higher yielding barley. 

The following table gives the variances which are removed in finding the 
random error in the various methods described when calculating the degree of 
significance of the results by the analysis of variance: 
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Yield Trials 

General Observations 

While it is obvious that there is much to be gained by planning yield trials 
in such a way as both to reduce the experimental error and to obtain an accurate 
estimate of it, i't is important to remember that conclusions can only be drawn 
applicable to the particular conditions under which the trials were carried out. 

For this reason, trials should be repeated season after season, and m so many 
different places as to cover the probable variations in soil and climate m the 
districts in which practical application is to be made. 

When this has been done, the accuracy which our technique has enabled us 
to obtain will enable us to analyse the results, so as to find out whether some of 
our strains/treatments suit certain soils/seasons/climates better than others. 

Moreover, it is not enough to show that one strain/treatment is superior to 
another in yield; to achieve lasting success we must subject our product to tests 
for quality. 

Thus, the Irish barley trials culminated in malting and brewing tests, as also 
those now being conducted by the National Institute of Agricultural Botany; 
Biffen’s wheats were chosen for their baking strength, as well as yield, and so on. 
On the other hand, accounts of yield trials of potatoes rarely conclude with a 
table of moisture percentages and an estimate of relative palatability (F. Johnson 
and O’C. Boyle, “The industrial and nutritive value of the potato in Ireland”, 
J. Dept. Agric. for Ireland , vol. xvm, No. 4). This may help to account for the 
modern potato. 

General References 

G. Udny Yule, Introduction to the Theory of Statistics, Griffin and Co., London; 
F. L. Engledow and G. Udny Yule, “The principles and practice of yield trials”, 
The Empire Cotton Growing Review , vol. in, Nos. 2 and 3, Empire Cotton 
Growing Corporation, Millbank, London; Fisher and Wishart, Imp. Bur. Soil 
Science, Tech. Comm. No. 10, “The arrangement of field experiments and the 
statistical reduction of the results ”. 

An excellent bibliography of papers, etc. is contained in W. Horton Beckett’s 
“Methods of field experimentation”, 1928 Year Book of the Dept, of Agric., 
Gold Coast. 
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THE LANARKSHIRE MILK EXPERIMENT' 

[Biometrika, XXIII (1931), p. 398] 

In the spring of 1930* a nutritional experiment on a very large scale was carried 
out in the schools of Lanarkshire. 

For four months 10,000 school children received f pint of milk per day, 5000 of 
these got raw milk and 5000 pasteurized milk, in both cases Grade A (Tuberculin 
tested); another 10,000 children were selected as controls and the whole 20,000 
children were weighed and their height was measured at the beginning and end of 
the experiment. 

It need hardly be said that to carry out an experiment of this magnitude success¬ 
fully requires organization of no mean order and the whole business of distribution 
of milk and of measurement of growth reflects great credit on all those concerned. 

It may therefore seem ungracious to be wise after the event and to suggest that 
had the arrangement of the experiment been slightly different the results would 
have carried greater weight, but what follows is written not so much in criticism of 
what was done in 1930 as in the hope that in any further work full advantage may 
be taken of the light which may be thrown on the best methods of arrangement by 
the defects as well as by the merits of the Lanarkshire experiment. 

The 20,000 children were chosen in 67 schools, not more than 400 nor less than 
200 being chosen in any one school, and of these half were assigned as “feeders” 
and half as “controls ”, some schools were provided with raw milk and the others 
with pasteurized milk, no school getting both. 

This was probably necessary for administrative reasons, owing to the difficulty 
of being sure that each of as many as 200 children gets the right kind of milk every 
day if there were a possibility of their getting either of the two. Nevertheless, as 
I shall point out later, this does introduce the possibility that the raw and 
pasteurized milks were tested on groups of children which were not strictly 
comparable. 

Secondly, the selection of the children was left to the head teacher of the school 
and was made on the principle that both “controls” and “feeders” should be 
representative of the average children between 5 and 12 years of age: the actual 
method of selection being important I quote from Drs Leighton and McKinlay’s* 

* Department of Health for Scotland: Milk Consumption and the Growth of Schoolchildren, 
by Dr Gerald Leighton and Dr Peter L. McKinlay (Edinburgh and London: H.M\ 
Stationery Office, 1930). 


_ 
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Report: “The teachers selected the two classes of pupils, those getting milk and 
those acting as 'controls’, in two different ways. In certain cases they selected 
them by ballot and in others on an alphabetical system.” So far so good, but 
after invoking the goddess of chance they unfortunately wavered in their ad¬ 
herence to her for we read: “In any particular school where there was any group 
to which these methods had given an undue proportion of well-fed or ill-nourished 
children, others were substituted in order to obtain a more level selection.” This 
is just the sort of after-thought that most of us have now and again and which 
is apt to spoil the best laid plans. In this case it was a fatal mistake, for in con¬ 
sequence the “controls” were, as pointed out in the Report,* definitely superior 
both in weight and height to the “feeders” by an amount equivalent to about 
3 months’ growth in weight and 4 months’ growth in height. 

Presumably this discrimination in height and weight was not made deliberately, 
but it would seem probable that the teachers, swayed by the very human feeling 
that the poorer children needed the milk more than the comparatively well to do, 
must have unconsciously made too large a substitution of the ill-nourished among 
the “feeders” and too few among the “controls” and that this unconscious 
selection affected, secondarily, both measurements. 

Thirdly, it was clearly impossible to weigh such large numbers of children 
without impedimenta. They were weighed in their indoor clothes, with certain 
obvious precautions, and the difference in weight between their February garb 
and their somewhat lighter clothing in June is thus necessarily subtracted from 
their actual increase in weight between the beginning and end of the experiment. 
Had the selection of “controls” and “feeders” been a random one, this fact, as 
pointed out in the Report,* would have mattered little, both classes would have 
been affected equally, but since the selection was probably affected by poverty it 
is reasonable to suppose that the “feeders” would lose less weight from this cause 
than the “controls”. It is therefore not surprising to find that the gain in weight 
of “feeders” over “controls”, which includes this constant error, was more 
marked, relatively to their growth rate, than was their gain in height, which was 
fortunately not similarly affected. 

Fourthly, the “controls” from those schools which took raw milk were bulked 
with those from the schools which took pasteurized milk. 

Now with only 67 schools, at best 33 against 34, in a district so heterogeneous 
both racially and socially, it is quite possible that there was a difference between 
the averages of the pupils at 33 schools and those of the pupils at another 34 schools 
both in the original measurements and in the rate of growth during the experiment. 

In that case the average “control” could not be used appropriately to compare 
with either the “raw” group or the “pasteurized” group. 

This possibility is enhanced by the aforementioned selection of “controls” 

* See footnote on p. 169. 
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which can hardly have been carried out in a uniform manner in different 
schools. 

Fortunately it would still be possible to correct this, for the figures for the 
different schools must still be available in the archives. 

Diagrams 1 and 2 give the average heights of “controls”, raw milk “feeders’’ 
and pasteurized milk “feeders” for boys and girls respectively. The heights at 
the beginning of the experiments are set out against a uniform age scale centring 



each group at the half year above the whole number. This is doubtless accurate 
enough except for the first group aged “ 5 and less than 6 ”, which was very much 
smaller in numbers than the other groups, either because only the older (or larger) 
children are sent to school between 5 and 6 or because the teachers did not think 
that the smaller children would be able to play their part. For this reason they 
should probably be centred more to the right compared to the others. A similar 
argument might lead us to centre the “ 11 and over ” group a little more to the left. 

The average heights at the end of the experiment are of course set out four 
months to the right of those at the beginning and it will be noticed that except for 
the first group, which is clearly out of place, not any of the points diverge very 
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much from their appropriate line of growth whether “controls”, “raws” or 
“pasteurized”. 

The case is very different in Diagrams 3 and 4 which show the corresponding 
average weights. Here there is, after the first two ages, a very decided dip, 
especially in the later ages. The weights at the end of the experiment are too low. 
This might be accounted for by a tendency in older children to grow normally in 
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height and subnormally in weight during the spring, but I think it much more 
likely that older children wear about 1 lb. more clothes in February than they do 
in June, while in the case of younger children a more limited wardrobe permits 
of fewer discards. 

The authors have tried to show that the selection of the “controls” has not 
affected the validity of the comparison, by computing the correlation coefficients 
between the original heights (and weights) and the growth during the experiment 
for each of the 42 age groups into which the measurements were divided. These 
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they find to he quite small even though they are here and there significant, and 
they argue that the additional height and weight of the “ controls” was without 
effect on the comparison of subsequent growth. 

Now this might have been a perfectly good argument had the height and weight 
been selected directly, but if, as I have indicated was very likely the case, the 


Age 



Numbers in each group 


selection was made according to some unconscious scale of well-being, then it is 
surely natural to suppose that the relatively ill-nourished “feeders ” would benefit 
more than their more fortunate school mates, the “ controls ”, would have done by 
the extra f pint of milk per day. 

That being so, how are we to regard the conclusions of the Report:* 

(1) “The influence of the addition of milk to the diet of school children is 
reflected in a definite increase in the rate of growth both in height and weight.” 

This conclusion was probably true; the average increase for boys’ and girls’ 

* See footnote on p. 169. 
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heights was 8 % and 10 % over “controls” and for boys’ and girls’ weights was 
30 % and 45 % respectively, and though, as pointed out, the figures for weights 
were wholly unreliable it is likely enough that a substantial part of the difference 
in height and a small part of that in weight were really due to the good effect 
of the milk. The conclusion is, however, shifted from the sure ground of scientific 
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inference to the less satisfactory foundation of mere authority and guesswork by 
the fact that the “controls” and “ feeders” were not randomly selected. 

(2) “ There is no obvious or constant difference in this respect between boys 
and girls and there is little evidence of definite relation between the age of the 
children and the amount of improvement. The results do not support the belief 
that the younger derived more benefit than the older children. As manifested 
merely by growth in weight and height the increase found in younger children 
through the addition of milk to the usual diet is certainly not greater than, and is 
probably not even as great as, that found in older children.” 
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Now from the authors’ point of view, believing in the validity of their com¬ 
parisons in weight, this is much understating the case, as the following table 
derived from Capt. Bartlett’s condensed tables* shows: 


Age in 
years 

Gain in weight in ounces 
by feeders over controls 

Gain in height in inches 
by feeders over controls 

As % oi 

Weight 

' control 

Height 

Boys 

Girls 

Boys 

Girls 

Boys 

Girls 

Boys 

Girls 

5, 6 and 7 

8 and 9 

10 and 11 

1-13 ±0-73 
3-15 ±0-68 j 
5-21 ±0-85 

1-24 ±0-72 
4-47 ±0-67 
7-88 ±0’79 

0-083 ±0-011 
0-071 ±0-011 
0-037 ±0-012 

0-059 ±0-011 
0-098 ±0-010 
0-055 ±0-012 

9 

30 

78 

13 

51 

73 

11 

10 

. fJ 

8 

14 

8 


Note that the p.e.’s are calculated from Capt. Bartlett’s tables and are subject, 
as his are, to his having interpreted the methods of the original Report correctly.’ 
From this they might have concluded: 


(a) That in the matter of weight older children, both boys and girls, derived 
more benefit than younger, while 


(b) In height the younger boys did better than the older, though the difference 
is not quite significant, but that there was no regular tendency in the matter of 
girls’ height. 

In the light of previous criticism, however, we must be content tp say that 
apparently the differential shedding of clothes between the “feeders” and the 
more fortunate “ controls ” is more marked with older children (and possibly with 
girls than with boys), and that there is some probability that younger boys gain 
in height more than older. 

Finally, conclusion (3) runs: “In so far as the conditions of this inyestigation 
are concerned the effects of raw and pasteurized milk on growth in weight and 
height are, so far as we can judge, equal.” 

This conclusion has been challenged by Capt. Bartlett,* and by Dr Fisher and 
Capt. Bartlett,f who conclude that there is definite evidence of the superiority 
of raw over pasteurized milk in both height and weight. 

Even they, however, point out that the raw and pasteurized milk were not 
supplied to the same schools, and their conclusion amounts to saying: “If the 
groups of children taking raw and pasteurized milk respectively were random 
samples from the same population, the observed differences would be decisively 
in favour of the raw milk.” J 

Unfortunately they were not random samples from the same population: they 
were selected samples from populations which may have been different, and more¬ 
over the ‘controls” with which they were compared were not appropriate to 


* “Nutritional value of raw and 
(J. Min. Agric, April 1931). 

f Nature , 18 April 1931, p. 591, “ 


pasteurized milk”, by Stephen Bartlett, M.C., B.Sc. 
Pasteurized and raw milk ”, 
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either group; and so—again it is a matter of guess and authority I would be 
very chary of drawing any conclusion from these small biased differences. 

That is not to say that there is no difference between the effect of raw and 
pasteurized milk—personally I believe that there is and that it is in favour of raw 
milk—but that this experiment, in spite of all the good work which was put into 
it, just lacked the essential condition of randomness which would have enabled 
us to prove the fact. 

This note would be incomplete without some constructive proposals in case it 
should be considered necessary to do further work upon the subject, and ac¬ 
cordingly I suggest the following: 

(1) If it should be proposed to repeat the experiment on the same spectacular 
scale, 

(a) The “controls” and “feeders 55 should be chosen by the teachers in pairs of 
the same age group and sex, and as similar in height, weight and especially physical 
condition (i.e. well- or ill-nourished) as possible, and divided into “ controls and 
“feeders” by tossing a coin for each pair. Then each pair should be considered to 
b e a unit and the gain in weight and height by the “ feeder ’ ’ over his own control 
should also be considered as a unit for the purpose of determining the error of the 
gain in weight or height. 

In this way the error will almost certainly be smaller, perhaps very much 
smaller, than if calculated from the means of “feeders” and “controls”. 

If in addition the social status of each pair be noted (well-to-do, medium, poorly 
nourished or some such scale) further useful information will be available for 
comparing pasteurized and raw “feeders”. 

If this is found to be too difficult a perfectly good comparison can be made by 
adhering to the original plan of the 1930 experiment and drawing lots to decide 
which should be “controls” and which “feeders” (this is better than an alpha¬ 
betical arrangement), but the error of the comparison is likely to be larger than 
in the plan outlined above. 

(b) If it is at all possible each school should supply an equal number of raw and 
pasteurized “feeders”, again by selection of similar children followed by coin 
tossing, but I fear that this is a counsel of perfection. 

(c) Some effort should be made to estimate the weight of clothes worn by the 
children at the beginning and end of the experiment: possibly the time of year 
could be chosen so that there would be little change in this respect. 

(2) If it be agreed that milk is an advantageous addition to children’s diet—and 
I doubt whether any one will combat that view—and that the difference between 
raw and pasteurized milk is the matter to be investigated, it would be possible to 
obtain much greater certainty at an expenditure of perhaps 1-2 % of the money* 
and less than 5 % of the trouble. 

# This is a serious consideration: the Lanarkshire experiment cost about £7500. 
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For among 20,000 children there will be numerous pairs of twins; exactly how 
many it is not easy to say owing to the differential death-rate, but, since there is 
about one pair of twins in 90 births, one might hope to get at least 160 pairs in 
20,000 children. But as a matter of fact the 20,000 children were not all the 
Lanarkshire schools population, and I feel pretty certain that some 200-300 pairs 
of twins would be available for the purpose of the experiment. 

Of 200 pairs some 50 would be “identicals ” and of course of the same sex, while 
half the remainder would be non-identical twins of the same sex. 

Now identical twins are probably better experimental material than is available 
for feeding experiments carried out on any other mammals, and the error of the 
comparison between them may be relied upon to be so small that 50 pairs of 
these would give more reliable results than the 20,000 with which we have been 
dealing. 

The proposal is then to experiment on all pairs of twins of the same sex available, 
noting whether each pair is so similar that they are probably “identicals” or 
whether they are dissimilar. 

“Feed” one of each pair on raw and the other on pasteurized milk, deciding in 
each case which is to take raw milk by the toss of a coin. 

Take weekly measurements and weigh without clothes. 

Some way of distinguishing the children from each other is necessary or the 
mischievous ones will play tricks. The obvious method is to take finger-prints, but 
as this is identified with crime in some people’s minds, it may be necessary to 
make a different indelible mark on a fingernail of each, which will grow off after 
the experiment is over. 

With such comparatively small numbers further information about the dietetic 
habits and social position of the children could be collected and would doubtless 
prove invaluable. 

The comparative variation in the effect in “identical” twins and in “unlike” 
twins should furnish useful information on the relative importance of “Nature 
and Nurture”. 

To sum up: The Lanarkshire experiment devised to find out the value of giving 
a regular supply of milk to children, though planned on the grand scale, organized 
in a thoroughly business-like manner and carried through with the devoted 
assistance of a large team of teachers, nurses and doctors, failed to produce a valid 
estimate of the advantage of giving milk to children and of the difference between 
raw and pasteurized milk. 

This was due to an attempt to improve on a random selection of the “ controls ” 
which in fact selected as “controls” children who were on the average taller and 
heavier than those who were given milk. 

The hypothesis is advanced that this was due not to a selection of the shorter, 
lighter children as such to take the milk, but to an unconscious bias leading the 
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teachers to pick out for this purpose the needier children whom the milk would 
be most likely to benefit. 

This hypothesis is supported by the fact that while the advantage derived from 
the milk was only 8-10 % of the gain in height, without much variation for age, 
it was 30-45 % of the gain in weight, varying from 9 to 13 % in the younger 
children (who do not seem to have shed much clothing in the summer) up to 
73-78 % in the older children—who obviously did. 

Suggestions are made for the arrangement: 

(1) Of a similar large-scale experiment on random lines, and 

(2) Of a much smaller and cheaper experiment carried out on pairs of twins 
of like sex. 

The second is likely to provide a much more accurate determination of the 
point at issue, owing to the possibility of balancing both nature and nurture in 
the material of the experiment. 
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ON THE “z” TEST 
[Biometrika , XXIII (1931), p. 407] 

In the last number of Biometrika Prof. Pearson sounds a warning note against 
e use of Student’s” z to determine the significance of an average difference 
between two sets of correlated variables. 

As this use is one to which I attach considerable importance, and as Prof. 
Pearson’s criticism does not all seem to me to be concerned with my own method 
ot usmg z, I should like to present the case from the point of view of the experi¬ 
menter who for some reason or another has to work with small samples 
In the great majority of experiments of this kind we are concerned with a 
difference : e.g. of yield between two varieties of a cereal; of weight between pigs 
fed on complete food and others on food deficient in vitamins; of size of loaf 
between breads baked from different flours; of reaction times between alcoholized 
and non-alcoholized persons, and so on. 

Now it is an elementary principle in all such experiments to reduce the error of 
such differences by arranging that they should he between variates which, apart 
irom the experiment, are as similar as possible. 

Thus each pan- of cereal plots should be grown not only on the same field but as 
near as possible to each other in that field; the pigs should be of the same sex 
from the same litter and as nearly as possible the same weight at the start- the 
oaves should be mixed at the same time and put next to one another in the oven 
and the alcoholized and non-alcoholized persons should alternate their roles so 
as to compare each person with himself. 

In other words, every care should be taken when planning the experiment to 
get the correlation between corresponding variates as high as possible, with the 
object of reducing the error and so of obtaining significant results from the small 
number of experiments which it is possible to carry out. 

Now Prof. Pearson’s criticism may be summarized under three heads- 
(1) That, assuming the advisability of the “ z” Test, it is only when the value 
ot 2 which is obtained is high that we can draw any useful conclusion: if it is low 
it cannot detect samples which may be abnormal in other ways. 

Agreed, but this inability to detect abnormalities extraneous to the test itself 
is shared with all single tests of significance and the result is that the wise man 
will never go further in the direction of asserting similarity than to say “The 
sample affords no evidence that, etc.” 


I 2“2 
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(2) That since we cannot deduce with accuracy the correlation of the popula- 
tion at large from the small sample before us, we are debarred from making use 

of that correlation to reduce our error. . 

But in fact we do not use the correlation in testing the significance. In obtaining 
our sample of difference, yes, but once we have the differences they are merely 
a sample from the indefinitely large population of differences, which might have 
been produced under similar conditions and I may say with the same p (not r) an 
which we, rightly or wrongly, assume to be normal. At all events Prof. Pearson 
is not here attacking * on the grounds of lack of normality. The correlation will 
vary from sample to sample, just as does the mean or standard deviation, but 
these variations in correlation do not affect the fact, which Prof. Pearson admi s 
that in a normal population 2 can be used to test the significance of the mean of 

small samples. 

What we actually ask ourselves is the following question: 

If the average difference between A and B in the population were zero, what 
would be the probability of obtaining a sample of differences giving a value of z 
as high as that observed? and if this probability is sufficiently small we say that 

the difference is significant. . ,, 

(3) Prof Pearson warns us to be careful to draw our conclusion from the 

experiment we have carried out and according to the particular set of differences 

which we have tested for significance. 

In this of course I agree with him, yet I do not feel that the warning is very 
much needed. In the hyoscyamine experiment which he quotes, we are able to 
deduce significance from a consideration of the effects of these drugs on the same 
individuals while we could not do so from groups of different individuals Bu 
surely no one is much interested in the latter point; if I am to take one of the drugs 
I will pay a good deal of attention to the probability that laevo will make me 
personally sleep longer than dextro and very little to the fact that the experiments 
give no satisfactory answer to the question of what wifi happen if I take laevo 

and my wife dextro. , 

To sum up, in properly planned experiments errors should be reduced as much 

as possible by the selection of highly correlated individuals to compare wit one 
another This correlation should to a greater or lesser extent reduce the variation 
in the differences between these individuals but does not prevent them being 
considered to be a sample drawn from a population of differences to which e 
“ z ” Test may be applied. Finally, care must be taken in planning the experiment 
that the differences to be examined for significance shall be those which furnish 
an answer to the question which we are asking. 
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EVOLUTION BY SELECTION 
THE IMPLICATIONS OF WINTER’S SELECTION EXPERIMENT* 

{Eugenics Review , XXIV (1933), p. 293] 

For some time after tlie publication of the Origin, of Species it was generally 
held by those who accepted Darwin’s reasoning that species originated by the 
accumulation of small variations in the same direction under the influence of 
natural selection; and the occurrence of large “mutations”, such as the Ancon 
sheep, was perhaps rather overlooked. The rediscovery of Mendelism, however, 
has tended to emphasize the latter portion of Darwin’s work, rather to the 
exclusion of the former, until it is actually held in certain quarters that the 
selection of small differences can only lead to small, or at all events strictly limited, 
changes of type. 

Yet it cannot be denied that, apart from colours and other “fancy” points, 
the actual improvement of domestic animals has usually proceeded by just this 
accumulation of small differences. 

If I am not mistaken, the view that selection is limited can be traced back 
to Johannsen’s work, where he showed that from an ordinary stock of beans 
there could be isolated a number of “pure lines”, which differed from each other 
in the mean weight of their seed, but within each of which no appreciable genetic 
variation in seed weight could be detected. 

His work has led to a considerable advance in the selection of cereal seed, 
since it is quite certain that for practical purposes “pure line” seed will behave 
in much the same way as if the plants were propagated vegetatively; they will 
start growing together and will ripen together, and their seed will be uniform 
and behave uniformly in its turn. Yet Johannsen, working of course with self¬ 
fertilizing material, found pure lines, not a pure line. Obviously, therefore, 
mutations had occurred with sufficient frequency to produce them; and, given 
time, it may be supposed that even in self-fertilized organisms progress could be 
made merely by selecting the extreme pure line, waiting for a mutation, selecting 
again, and so on. Tedious work—but for the Origin of Species there is now 
plenty of time. 

From a practical point of view, however, the plant breeder cannot afford to 
wait for favourable mutations; he cross-fertilizes—and so in most cases does 

* Winter, Floyd L., “Continuous selection for composition in corn”, J. Aqric. Res 
July-December 1929, pp. 451-75. 
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nature. Now until experience has been accumulated, the results of cross¬ 
fertilization are unpredictable; hut very soon certain facts begin to emerge 
“Tall is dominant to dwarf”, “Two rowed is dominant to six rowed”, and so 
forth, and such things attract attention and rather obscure other equally im¬ 
portant facts. Cross a “dense” and a “lax” variety, and among the ultimate 
progeny may be found plants “denser” than the “dense” parent and laxer 
than the “lax” and almost anything between. Cross high and low protein, and 
the same overlapping will be found when the first mix-up has sorted itself out. 

Try to explain this on Mendelian lines and it will soon become obvious that 
even in self-fertilized plants there must be a tremendous variety of genetical 
make-up; one or two relevant genes will be quite inadequate to explain the facts, 
ten or twenty will complicate the calculation, but will be none too many. Perhaps 
it would be better to postulate 200-300 and reduce the problem to mathematics. 

Since characters which do not affect the survival of the organism are not 
encountering selection, an ordinary cross-fertilizing population must be expected 
to accumulate among all its members very large numbers of genes corresponding 
to such unessential characters. In ordinary times these would roughly neutralize 
one another , each individual carrying a mixture of genes which would produce 
variation in opposite directions, so that only a limited genetic variation would 
result; but with a change of environment this reservoir of genes would serve a 
very useful purpose as raw material for selection: some characters, formerly 
neutral, would then affect survival and all those genes which produce favourable 
somatic variation would tend to be preserved while their opposite numbers 
would be eliminated. Thus the accumulation of small variations in the same 
direction could proceed far beyond the original range.* 

* Perhaps this argument may be clarified by an illustration. Suppose during a period 
when height is of no particular importance to an organism two hundred small mutations 
have succeeded in establishing themselves in equilibrium, each of which affects height to 
an equal extent, say, 1 mm. We may represent the first gene as either cq, present, or b x , 
absent, the second as a 2 or & 2 , and so on. Then any individual will contain either eq%, 
a x b x , or b x b x and the proportions in which these possibilities occur will be assumed for the 
sake of illustration to be 1.2.1; similarly with the other subscripts, so that the distribution 
of individuals according to the numbers of “a” genes which they contain will be m pro¬ 
portion to the coefficients of the binomial ( a 2 + 2ab + 6 2 ) 200 or of (a + &) 400 . 

The standard deviation of this binomial distribution is 10, so that although it would be 
possible for an individual to contain the “a” genes in any number from 0 to 400, yet m 
practice even a population of 100,000,000 would be very unlikely to outrange 140-260 
corresponding to 120 mm. of height between the highest and the lowest individual, less 
than one-third the possible range. 

If now we imagine only the highest half of the population to mate (at random) we should 
get a rise in “a” content of 8 in the mean value, to 208, while the standard deviation and 
range would hardly be altered, so that the process could be repeated, a further rise of 
8 mm. obtained, and so on until the mean would rise well beyond the value of the original 
extreme individual: and all this without fresh mutations. Of course this illustration has 
been simplified to the point of absurdity, but it may serve to exhibit the possibility of 
such potential variation. 
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That such a state of things does indeed exist seems to be indicated by Winter’s 
paper, to which I am now drawing attention. This describes a very determined 
experiment carried out on “corn”, i.e. maize. Now maize is commonly cross- 
fertilized; unless cross-fertilization takes place, the stock is apt to die out—which 
makes pure line selection very difficult. Nevertheless much may be done by mass 
selection, and it is with mass selection that Winter was concerned. 

Premising that he selected continuously for twenty-eight years, from 1896 to 
1924, it is perhaps best to quote his description of the procedure verbatim: 

One hundred and sixty-three ears of a variety known as “Burr’s White” were used 
as foundation stock from which selections were made in four different directions, 
namely for high oil, low oil, high protein and low protein. 

These four strains were carried on in the same way. In the high protein, for example, 
twenty-four ears highest in protein were selected for seed and planted in an isolated 
plot, each ear in a separate row. These ears were harvested separately and the seed 
for the next crop selected from the ears which were found to be highest in protein. 
Nine years later the system was modified somewhat in an attempt to prevent loss of 
vigour by inbreeding. Alternate rows were detasselled and seed was selected only 
from the highest yielding detasselled rows. In 1921 this system was again modified 
to reduce the amount of inbreeding. Two seed ears were taken from each of the 
detasselled rows regardless of yield. 

The high oil, low oil and low protein tests were similarly conducted, selection being 
made each year of ears highest in oil, lowest in oil and lowest in protein, respectively. 

For a proper appreciation of the work the original paper should be consulted, 
but only a few figures will be necessary to display the interest of the results: 

I will deal with the figures giving the percentage of oil, which are the more 
striking, but the facts are similar in the case of the protein. 

(1) Two strains have been selected, one which has a mean percentage of oil 
about twelve times the standard deviation of the original population above the 
original mean, and the other about seven times below. As illustrating this, the 
minimum value in the high race during the last five years is considerably higher 
than the maximum value found during the first four years and, on the other 
hand, the maximum value in the low race is even more markedly below the 
lowest in the first four years. 

(2) Although the standard deviation of the high race has risen and that of the 
low has fallen during the experiment, it would be hard to say whether on the 
whole there has been a decrease or an increase in variability owing to the selection. * 

We may assume the variance to be composed of two parts, one inherent and 
therefore subject to selection, and the other environmental, or “fluctuating”, 
and therefore a hindrance to selection. Just what proportion we should allot to 

* Dr Rasmussen, of Svalof, has pointed out to me that this might perhaps fie explained 
by an exaggeration in the environmental effect when acting on plants enfeebled by in- 
breeding. But the steady rise in oil percentage right up to the end of experiment seems to 
require an almost undiminished genetic variability. 
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Protein 


Year 


Mean value 
o/ 

/o 


Standard 

deviation 


1896 


10-93 


1-04 


1897 

1898 

1899 


High 

10-99 

10- 98 

11- 62 


Low 

10-63 

10-49 

9-59 


High 

1-16 

1-22 

1-28 


Low 

0-90 

1-32 

1-01 


Lowest variate 


Highest variate 


8-3 


13-9 


High 

8-3 

7- 7 

8- 4 


Low 

8-2 

7-5 

6-7 


High 

13- 6 

14- 9 
14-8 


Low 

14-0 

13-4 

131 


1920 

1921 

1922 

1923 

1924 


14-01 7-54 

16- 66 9*14 

17- 34 7-42 

16-53 6-48 

16-60 8-38 


1-79 0-89 

1-84 1-35 

1-24 0-70 

1-41 0-73 

1-19 1-17 


9-5 6-0 
9-4 6-6 
12-6 6-1 
13*1 5-0 
14-6 6-1 


17- 4 10-5 

18- 8 13-4 

20-6 9-6 

19*7 9-4 

19- 2 11-8 


Oil 


Year 

Mean value 
% of oil 

Standard 

deviation 

Lowest variate 

Highest variate 

1896 

4-68 

High Low 

0-41 

High Low 

3-9 

High Low 

6-0 

High Low 

1897 

4-79 

4-10 

0-38 

0-29 

3-6 

3-4 

5-7 

4-7 

1898 

5-10 

3-59 

0-48 

0-32 

4-1 

3-2 

6-7 

4-8 

1899 

5-65 

3-85 

0-42 

0-32 

4-3 

2*8 

6-5 

4-6 

1920 

9-28 

1-80 

0-52 

0-21 

7-8 

1-0 

10-6 

2-4 

1921 

9-94 

1-71 

0-66 

0-15 

8-4 

1-0 

11-7 

2-3 

1922 

9-86 

1-68 

0-54 

0-19 

8-7 

0-9 

11-3 

2-2 

1923 

10-08 

1-58 

0-65 

0-24 

8-3 M 

11-8 

2-1 

1924 

9-86 

1-51 

0-61 

0-22 

8-4 

0-9 

11-7 

2-2 


each of these we have no sure means of judging, hut in both cases the latter is, 
I believe, likely to be very large. Incidentally it may perhaps account in both 
cases for the obvious correlation* between the mean and the standard deviation. 

In any case the inherent part of the variation had of course a smaller standard 
deviation than that observed for the whole, perhaps much smaller, so that the 
movements of the means were, respectively, more than twelve and seven times, 
this “inherent’* standard deviation. Hence either the possibilities of variation 
latent in the original material were enormous or a steady stream of favourable 
mutations was maintained to carry the means along. 

In any case, these results cannot be explained on the basis of a few easily 
detected genes. But by reducing the problem to the simplest possible basis 
starting from the intensity of selection, the rate of movement of the means at 
first, and the difference between the initial and final values of the mean it is 
possible to make some sort of calculation of the minimum number of genes which 
might allow of so large a change by repeated selection. And I find that the order 
of these numbers is 100-300. There is little indication, however, that selection 

* It is reasonable to suppose that a given variation in environment would produce 
greater variation in a high genetic stock than in a low one. 
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had yet reached its limit after twenty-eight years, and we should probably be 
within the mark if we assumed that the number of genes affecting oil (or protein) 
content in Burr’s White Maize may run up to thousands. 

But if we have thousands of genes, continuous selection in one direction may, 
in fact must, result in progress almost without limit (at all events until the 
progress itself induces counter-selection as perhaps it does in the case of low oil 
content) for although the selection will reduce the number of genes there will 
be time for fresh mutations to occur to keep up the possibility of further selection. 

Summary and Conclusion 

To sum up: Winter has in this experiment succeeded, by continuous mass 
selection, in producing two races of maize, one of which has more than twice, 
and the other less than one-third, the normal oil content. 

In a character so influenced by environment the progress has, of course, not 
been uniform in its manifestation; but it appears to have been comparatively so 
genetically, and shows little or no indication that it has reached its limit in either 
direction. 

It does not appear that such steady progress could be obtained with less than 
hundreds of genes affecting oil content and it seems not unlikely that there are 
thousands. In any case it is clear that the possibilities of continuous selection 
of small variations for the formation of new species are likely to be very much 
greater than would appear merely from a consideration of Johannsen’s work on 
pure lines, which was carried out on a self-fertilizing organism. 

And so we reach the conception of species patiently accumulating a store of 
genes, of no value under existing conditions and for the most part neutralized 
by other genes of opposite sign. When, however, conditions change, unless too 
suddenly or drastically, the species finds in this store genes which give rise to 
just the variation which will enable it to adapt itself to the change. 

It follows that the change appears to have produced the variation which it 
has merely selected from among those potentially present. Thus we can reconcile 
the view held, amongst other people, by the late Walter Heape, that the environ¬ 
ment produces the required variation, with the older Darwinian selection of 
random variations, to which it appears at first sight to be diametrically opposed. 
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A CALCULATION OF THE MINIMUM NUMBER OF 
GENES IN WINTER’S SELECTION EXPERIMENT 

[Annals of Eugenics, VI (1934), p. 77] 

In a note on Winter’s selection experiment* published in the Eugenics Review f 
I made the following claim: 

By reducing the problem to the simplest possible basis... it is possible to make some 
sort of calculation of the minimum number of genes which might allow of so large a 
change by repeated selection. And I find that the order of these numbers is 100-300. 

Prof. Fisher, however, pointed out in Nature J that I had in fact over-simplified 
my problem and that no such conclusion could be drawn from my “sort of 
calculation”. 

This did not in fact invalidate my main thesis, which was that species tend to 
accumulate a sufficient store of genes of no particular value until they meet with 
a change of environment, when the store provides material for selection far 
beyond the normal range. 

But although the calculation was based on over-simplified data and was 
superfluous to my argument, it is of some interest in itself, and the present note 
is an attempt to “mend my hand” by making more reasonable assumptions. 

I shall start by giving a very short account of Winter’s experiment with an 
abbreviated table, hoping that my readers may be sufficiently interested to study 
Winter’s paper for themselves. 

Then I shall make an estimate of the standard deviation of that part of the 
variation in oil content of Winter’s maize which was due to genetic constitution, 
and measure the difference between the mean oil content of his “high ” and “low ” 
races in terms of this standard deviation. 

I shall next make an estimate of the minimum numbers of genes which would 
suffice to account for so large a ratio between the possible range and the standard 
deviation. 

Finally, I shall discuss the various assumptions which have been made, pointing 
out which of them are in my opinion reasonable, which have reduced the minimum 

* “The mean and variability as affected by continuous selection for composition in 
corn”, J. Agric. Res. xxxix (1929), pp. 451-75. 

f “Evolution by selection. The implications of Winter’s selection experiment”, Eugen. 
Rev. xxiv (4 Nov. 1933) [18]. 

% “Number of Mendelian factors in quantitative inheritance”. Nature, cxxi (18 March 
1933), p. 400. 
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number of genes to a figure below that which is probable, and which are merely 
the best assumptions we can make. 

Winter’s experiment, then, was concerned with a continuous selection of maize 
in the directions of high and low protein and high and low oil content, and l am 
only concerned here with the latter. 

The experiment was begun in 1896 and has continued to the present day,* 
but only the first 28 years were reported on in his paper, i.e. till 1924. The following 
is his description of the procedure, which I have only altered by instancing the 
oil content part of the experiment, whereas he quoted the similar case of the 
protein: 

One hundred and sixty-three ears of a variety known as “ Burr’s White ” were used 
as foundation stock from which selections were made in four different directions, 
namely for high oil, low oil, high protein and low protein. 

These four strains were carried on in the same way. In the high oil, for example, 
twenty-four ears highest in oil were selected for seed and planted in an isolated plot, 
each ear in a separate row. These ears were harvested separately and the seed for the 
next crop selected from the ears which were found to be highest in oil. Nine years 
later the system was modified somewhat in an attempt to prevent loss of vigour by 
inbreeding. Alternate rows were detasselled and seed was selected only from the 
highest yielding detasselled rows. In 1921 this system was again modified to reduce 
the amount of inbreeding. Two seed ears were taken from each of the detasselled rows 
regardless of yield. 

The high protein, low protein and low oil tests were similarly conducted, selection 
being made each year of ears highest in protein, lowest in protein and lowest in oil, 
respectively. 


Year 

No. of ears 
analysed 

Mean value 
percentage 
of oil 

Standard 

deviation 

Lowest 

variate 

Highest 

variate 

1896 

163 

High Low 

4-68 

High Low 

0-41 

High Low 

3-9 

High Low 

6*0 

High Low 

1897 

80 

50 

4-79 

4*10 

0-38 

0-29 

3*6 

3*4 

5*7 

4*7 

1898 

216 

108 

5-10 

3*59 

0-48 

0-32 

4-1 

3-2 

6*7 4*8 

1899 

108 

144 

5-65 

3-85 

0-42 

0-32 

4*3 2-8 

6-5 

4*6 

1900 

108 

144 

6-10 

3-57 

0-44 

0-36 

4*6 

2-6 

7*4 

4-5 

1901 

126 

126 

6-24 3-45 

0-45 

0-26 

4-9 

2*8 

7*1 

4*1 

1920 

120 

120 

9-28 

1-80 

0*52 

0-21 

7*8 

1*0 

10*6 

2*4 

1921 

120 

120 

9-94 

1-71 

0-66 

015 

8*4 

1*0 

11*7 

2*3 

1922 

120 

120 

9-86 

1-68 

0-54 

0-19 

8*7 

0*9 

11*3 

2*2 

1923 

120 

120 

10-08 

1*58 

0-65 

0-24 

8*3 

1*1 

ll*8 

2*1 

1924 

120 

120 

9*86 

1*51 

0-61 

0*22 

8*4 

0*9 

11-7 

2*2 


The above table gives certain figures for the first six and the last five years of 
the experiment, and it will be seen that, although the original maize only varied 

* Mr Winter in correspondence a year or two ago told me that both these experiments 
and one on height wore still being continued and still showed a continued, if less marked, 
effect of selection. In the latter case he had arrived at mean heights of 8 ft. and 8 in. in 
two races derived from a 4 ft. maize. 
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in oil content from 3-9 % to 6-0 %, the lowest variate of the high race after 28 years 
of selection was 8*4 % in oil content, while the highest variate of the low race 
was only 2-2 %; in each case they were clean outside the original range, a fact 
which seems difficult to explain except on the hypothesis that the oil content of 
the original race was due to a number of genes which largely neutralized one 
another, some raising and some lowering it, thus allowing selection far outside 
the original range. 

It will be noticed that the standard deviation of the percentage of oil in the 
original race was 0-41 and that as time went on the high race became more 
variable and the low less so: this was presumably due to the interaction of the 
environmental variation with the genetic, an individual tending to produce high 
oil giving more scope to changes of environment than one which tends to produce 
low oil. 

Nevertheless, on the whole the variation has not decreased, and we shall 
probably not be far wrong in assuming that there was no appreciable change in 
variability for the first three generations of selection. So that we may take the 
original standard deviation as the root mean square of the seven values 0*41, 
0-38, 0-29, 0-48, 0-32, 0*42 and 0-32, which is 0-38. 

After three selections in each direction the mean of the high race was 5-65 and 
that of the low 3-85, a difference of 1-80, and this difference may be taken as 
genetic. 

Now we are told that in the first generation 24 ears were selected in each 
direction out of 163 and, on the assumption of normal distribution of oil content, 
the mean of these selected ears would have an oil content 1-56 x <r v above (or 
below) the mean, <r v being the standard deviation of the oil distribution. It is 
further stated that there were 80 ears analysed of the high race and 50 of the 
low in the next generation, and it is, I think, a fair inference that 24 of each of 
these were taken in the next selection. This is confirmed by the fact that in the 
later generations 120 ears (5 x 24) were invariably analysed. 

The mean of 24 ears selected from 80 (on the normal assumption) is 1*16cq, 
above the mean and that of 24 from 50, 083oq below the mean, and the corre¬ 
sponding figures for the next selection (^- and x 2 ^) are 1*71 cr v and l*34cq,, so 
that the total shift of the mean of the high race was (l-56+lT6+l-71)cq, = 4* 43 cr y 
and that of the low race 

(T56 + 0-83 + 1-34) cr v = 3-73 cr vi 

or altogether the races were shifted apart 8T6cq,, of which T80 appears to have 
been genetic, as shown by the distance apart after the six selections. 

Now if <x v be the standard deviation of total variation and cr g of that part 
which is genetic, then, on the supposition of independence between the environ¬ 
mental and genetic parts of the variation cr^/cq, is the correlation between the 
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genetic and the total variation, so that cr^/cr^ is the regression factor reducing 
the mean of the selected portion to the mean of the next generation. 

2 

Hence 8-16<r. x ^ = 1-80, 

at 


rTa = J (mb x °’ 38 ) = °' 29, 

Since the differences between the means of the high and low races in the last 
five generations were 7*48, 8*23, 8*18, 8*50, 8*35, we shah not be far wrong if 
we estimate the genetic range at not less than 29 times the genetic standard 
deviation (29 x 0*29 = 8*41). 

We have now to estimate the minimum number of genes which will give as large 
a ratio as 29 between the maximum range and the standard deviation. 

In the first place it is clear that less genes will be required if the effect of each 
on the oil content is the same, and we shall assume that each gene if homozygous 
produces an effect 2 Jc and if heterozygous h . Further, let us suppose % genes, the 
rth to be present in P r of the possible loci and absent in Q r , and let 


r r . o 

Pr = ^Q r and «' = p^q; 

Then p\ individuals will be 2k higher owing to that gene, 2p r q r individuals will 
be k higher owing to that gene, and gf individuals will have no effect from that 
gene. (I have taken the rth gene as increasing the oil, but clearly, the same 
effect is produced in the case of a gene which decreases the oil, but the convention 
in this case is that p represents the absence of such a gene and q its presence.) 
Then the distribution of all the n genes will be given by the various terms of 
eip * n ““ 

and the extreme individuals (in genetic constitution) will be present in the 
proportions p\p\pl ...p* ...pi and q\q\...ql ...q 2 n> 

and they will differ by 2nk , whereas the standard deviation of the expansion is # 


^l{2n(pq-crl)}k, 

where p is the mean of the p’s and q of the g’s (which we may take as J each), 
while cr p is the standard deviation of the p’s (which also = <x q ). 

Now according to Prof. Fisherf the frequency distribution of the p’s is given 
by the equation Af = C/pq and, after some tedious algebra, I find that 


2 _ i N-l 

** f 2NS N _ 1 (l/ry 

where N is the number of loci (here 2 x 163 = 326), and this reduces to l - 0*0783. 

* “An explanation of deviations from Poisson’s Law in practice”, Biometrika, xn 
(1919), p. 213 footnote [9, p. 67]. f Oenetical Theory of Natural Selection (1930), p. 91. 
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Hence we have the standard deviation of the expansion 

f{2n x O'07 83} A, 

and the ratio of the extreme range to the standard deviation 


2 nh 

^J{2n x 0-0783} & 


j(25-5n), 


Hence to determine n we equate ^(25*5 n) — 29, n = 33. 

In this calculation the following assumptions have been made, which seem to 
me to be reasonable, and small departures from them will not seriously affect 
the result: 

(а) The distribution of the percentage of oil has been taken as normal. 

(б) I have taken the genetic standard deviation as being appreciably constant 
for the first three generations and have assumed that the difference between the 
high and low races at this point will be sufficiently accurate to give the genetic 
part of the variation. 

To test this I have calculated the number of genes on the basis of 


taking the first 
up to the second 

„ „ third 

„ „ fourth 

.. ,, fifth 


pair of selections giving 33 genes 
. 25 „ 


All numbers of much the same order. 

(c) I have assumed linear regression of genetic on total variation and inde¬ 
pendence between the genetic and environmental variation. 

(d) I have assumed that the mean value of the p’s and #’s is 

(e) Following Fisher, I have assumed an equal distribution of the logarithm 
of the gene ratio. This should follow whether the gene is absolutely neutral or 
has a small selective advantage. My own feeling is that there must be a large 
class of variations which, if they occur in an individual at one end of the range, 
are favourable, but are unfavourable at the other. As the general distribution 
in the species tends to be broken up into local races with means more or less 
different from the general mean, genes will introduce themselves by mutation 
into such local races as are favourable to their retention and, when firmly 


established, into the main body of the species. 

The following assumptions are such as to give a minimum value of n: 

(/) I have taken the minimum range as 8*41, i.e. I have not allowed for any 
genetic variation beyond the means of the last generations, whereas Winter 
actually found during the next eight years that the means were still moving apart. 

If, for example, I had added even as little as three times the standard deviation 
outwards at each end, making 35 times the standard deviation, n would have 


risen to 48. 
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(g) I have already mentioned that the assumption of equal effects from all the 
genes minimizes n. 

(h) I have assumed absence of dominance. Clearly dominance would increase 
the standard deviation for the same range and so increase n. 

(i) I have, naturally, only been able to deal with such genes as were included 
m Winter’s sample of 163 heads, tracing back to, at most, 326 loci. Hence only 
quite a small proportion of the rarer genes can have been included, and, according 
to Fisher, far the greater number of genes consists of those which individually 
occur but seldom. Further, even of these genes included in the original sample, 
many must have been lost at random in the first few selections and so not have 
been taken into account by the calculation. 

Lastly, the remaining assumptions cast an element of doubt on the whole 
calculation: 

(j) Although the standard deviation is correlated with the mean, so that we 
seem to be measuring variation at the low end of the distribution in smaller 
units than at the high end, I have taken the difference between the means of 
the high and low units as if it was uniform, and divided by the standard deviation 
determined at the middle of the scale. I suspect that this tends to exaggerate 
the difference and so n. 

{h) I have assumed that the effect of the genes is additive, whereas they may 
really obey some quite other law. 

Nevertheless, though I do not feel that the above calculation can be altogether 
absolved from the charge of “playing with figures”, I think that it does really 
afford some evidence that the oil percentage of Winter’s maize was conditioned 
by the presence, or absence, of a number of genes, at least of the order 20-40, 
possibly of 200-400, and not at all likely to be of the order 5-10. 

The 100-300 minimum of genes of the former paper has therefore been reduced 
to 20-40, but however few or many genes may have been present, the fact 
remains that Winter was able to select his maize races far outside the range of 
his original material. This seems to me to justify* 

the conception of species patiently accumulating a store of genes, of no value under 
existing conditions and for the most part neutralized by other genes of opposite sign. 
When, however, conditions change, unless too suddenly or drastically, the species 
hnds m this store genes which give rise to just the variation which will enable it to 
adapt itself to the change. 

It follows that the change appears to have produced the variation which it has 
merely selected from among those potentially present. Thus we can reconcile the 
view that the environment produces the required variation, with the older Darwinian 
opposed ° f rand ° m variations > to whicl1 if appears at first sight to be diametrically 

* “Evolution by selection. The implication of Winter’s selection experiments”, Euaen 
Rev. xxiv (1933), [18, p. 185]. J 




20 

CO-OPERATION IN LARGE-SCALE EXPERIMENTS 

[A Discussion, opened by Mr W. S. Gosset, at the meeting of the Industrial and Agri¬ 
cultural Research Section of the Royal Statistical Society, 26 March 1936. Sir Daniel 
Hall, K.C.B., F.R.S., in the Chair.] 

[Supplement to J. Roy. Statist. Soc. in (1936), p. 115] 

At the outset I must confess that the title is to some extent misleading: co¬ 
operation is, I am quite sure, advantageous in all large-scale experiments whether 
industrial or agricultural, but it happens that, though no farmer, I have only 
had first-hand experience of co-operation in agriculture and my paper must, 
therefore, deal with that. On the other hand, there are several Fellows present 
who will doubtless be able to draw analogies from agriculture to industry as the 
general principles of experimentation are common to both. 

Forty years ago agricultural experiments were mainly carried out in fairly 
large plots, generally without replication, and in consequence the soil differences 
between two plots which were to be compared were often so large as to obscure 
the issue. 

Then about thirty years ago, several different investigators harvested ap¬ 
parently uniform fields by small plots, and it at once became obvious that the 
variation in fertility from point to point in a field is so distributed that to obtain 
the best experimental results it is necessary to work with a number of small 
plots. These should be arranged so that comparable plots lie close together, and 
it appeared further that this replication of plots enabled us to make an estimate 
of the error of our results in a single experiment; before this it had only been 
possible to estimate the error of a series of experiments carried out at a number 
of stations or in a number of years. 

Finally, about fifteen years ago, Prof. Fisher introduced the principle of 
randomizing the position of the plots in the various systems of randomized blocks 
and Latin squares with which many of you are familiar. This enabled us to 
obtain a certainly valid estimate of the variability of our results, though usually 
at the expense of increasing that variability when compared with balanced 
arrangements. 

* Nevertheless, it must not be supposed that valuable results could not be 
obtained by the primitive methods of forty years ago; for example, in the 1880’s 
and 1890’s the Danes, working with comparatively large plots, with few replica¬ 
tions, but at several co-operating stations and in a number of successive seasons, 
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were able to establish that Prentice was the most suitable barley to grow in 
Denmark. 

On the other hand, Mr Yates has pointed out that it is not uncommon, when 
using the most modern methods in manurial experiments, to obtain a significant 
result on one occasion, but, on repeating the experiment in another year or in 
another field, to get an equally significant result in the opposite direction. 

Nor is the reason of this far to seek; among the many causes which influenpe 
the result of an experiment, we can only control by the arrangement of our plots 
those connected with the variation in fertility of the experimental area; apart 
from these we have the wide differences in soil and climate over the districts in 
which we wish to apply the conclusions which we draw from our experiments. 

Hence the old work, if repeated on a representative scale and sufficiently often, 
was able to give results which were applicable over a wide area, while the very 
accuracy of Mr Yates’s methods enables him to reach significance for results of 
merely local value. 

Nevertheless, it would be a mistake to reduce the accuracy by insufficient 
replication, for only by repeating such work at different times and places can the 
causes of such apparent anomalies be traced, and for that the more we can 
eliminate mere soil errors the better. 

But such repetitions can only be carried out co-operatively, and I propose to 
give some instances of such co-operation, beginning with the simplest technique. 

Just before the beginning of this century the Irish Agricultural Organization 
Society, which later became the Department of Agriculture, began a research 
into the most suitable variety of barley to grow in Ireland, and this research 
has been continued to the present day. During this time three varieties of barley 
have been introduced into Ireland, after adequate evidence had been obtained 
that each was better than the barley which it succeeded, and the ipethods of 
seed distribution are such that after a very few years the new barley has replaced 
the old in practically all the barley-growing districts in Ireland. 

It is interesting to note that the first of the three barleys to be introduced was 
found to be identical with that which the Danes had proved to be most suitable 
for Denmark; the other two were obtained from it by cross-fertilization bv 
Dr Hunter. 

The resulting gain in yield has been remarkable, and though it would be easy 
to attach too much importance to evidence supplied by the official estimates, 
they tally fairly well with the claim which has been made, on the basis of the 
experimental plots, that there has been a gain of from 20 to 25 %. 

During the last ten years the official yield has dropped below 5 qr. only once, 
while only twice in the previous sixty years did it rise above that figure. 

The low yields between 1916 and 1925 were partly due to unfavourable weather, 
but also to the extension of arable land during the war, with consequent inclusion 

BPS 
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Table I 


Yield of Barley in Ireland in Quarters per Acre 


Before experimenting 

After experimenting 

1866-1870 

4-0 

1901-1905 

4-5 

1871-1875 

41 

1906-1910 

4-7 

1876-1880 

3-9 

1911-1915 

4-8 

1881-1885 

3-9 

1916-1920 

41 

1886-1890 

3-9 

1921-1925 

41 

1891-1895 

4-3 

1926-1930 

5-3 

1896-1900 

4-3 

1931-1935 

5*1 


of less suitable land, and to the subsequent decline in farming technique owing 
to wages being high compared with prices. 

The experiments are carried out at about ten centres where three varieties are 
tested against the standard variety in one-acre plots. This somewhat primitive 
arrangement has been carried on up to the present day in order to provide plenty 
of barley for quality tests. 

In any case, after some years the weather and the barley-growing land of the 
country were sampled in a way which would be impossible at a single station. 
The number of farms should, of course, be larger, and doubtless it would be but 
for the fact that only one official is available for supervision, and ten farms at 
distances of, in some cases, over 100 miles is as much as he can manage even 
when the experiment is of this very simple type. 

The error of a comparison between two one-acre plots is large, and quite a 
number of seasons pass before enough repetitions are available to reduce the 
error to a figure which will show that a new variety really yields better than the 
standard. As, however, it is as necessary to sample weather as districts, this is 
of no great disadvantage. 

The order of this error is of interest, and I have examined two series to determine 
it; the first was carried out between 1901 and 1906, when 51 comparisons between 
Archer and Goldthorpe gave an average advantage to Archer of 7*7 % with a 
standard error of a single comparison of 15-5 %. This tallies well enough with 
the traditional 10 % for the error of a comparison of a pair of plots at one station, 
having regard to the further real variation due to the differential response of the 
varieties to soil, climate and farming technique. 

The second series was carried out between 1925 and 1935, when two selections 
of the Spratt-Areher cross were compared: they differed by 0-27 % in 103 trials 
with a standard error of 9-3 %. 

These two estimates of the error of a comparison, 15-5 % and 9*3 %, differ 
significantly, and it is noteworthy that the smaller figure was found with barleys 
which might be expected to react in much the same way to differences in soil 
and weather. 
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A second set of experiments has been carried out by the National Institute of 
Agricultural Botany, and I instance it to give an idea of the advantage of using 
a method which reduces the error at each station—namely, Beaven’s half¬ 
drill strip. 

It has been said that from an experiment conducted by this method no valid 
conclusion can be drawn, but even if this were so, it would not affect a series of 
such experiments. Each is independent of all the others, and it is not necessary 
to randomize a series which is already random, for, as Lincoln said, “you can’t 
unscramble an egg”. Hence, since the tendency of deliberate randomizing is to 
increase the error, a balanced arrangement like the half-drill strip is best if 
otherwise convenient. 

From this work I have taken two series, one of 22 comparisons between Spratt- 
Archer and Plumage-Archer barleys carried out from 1925 to 1928 when the 
former yielded 6*1 % more and the standard error of a comparison was 8*1 °/ 0 . 

There was, however, one experiment in which the method was not followed 
in several particulars, and if that be omitted the standard error falls to 5*6 %. 

The second series of N.I.A.B. experiments was a comparison between Spratt- 
Archer and a selection from Plumage-Archer which was carried out at six stations 
and for three years. It is thus possible to analyse the variance, and though the 
numbers are too small to give a significant difference in variance, there is an 
indication that the greater part was connected with the stations. The average 
superiority in yield of Spratt-Archer was 8*2% and the s.d. of a comparison 
was B*4 %; this is significant for 18 comparisons, so that the main object of the 
experiment was attained provided that the stations could be assumed to be a 
representative sample. 

The analysis of variance is as follows: 


Degrees of freedom 

Sum of 
squares 

Mean 

squares 

Seasons 

2 

22*25 

11*13 

Stations 

5 

815*34 

163*07 

Remainder 

10 

352*26 

35*23 

Total 

17 

1189*85 

69*99 


The remainder, of course, includes not only the error due to soil differences, 
but also those due to the local differences in climate within each season and to 
the difference between the fields used at each station. 

I have drawn attention to this small series because it indicates the possibility, 
had there been sufficient stations, of connecting the peculiarities of the soil and 
weather at the stations with the relative yields of the varieties. Thus there was 
an indication that Spratt-Archer was less superior to Plumage-Archer when the 
yields were high, but it was by no means significant. 

13-2 
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A gauming , then, that the error of the one-acre plot experiment is of the order 
12 % and that of the half-drill strip 8 %, the advantage of the latter is not so 
much that fewer experiments would be needed to evaluate a given difference in 
yield, for in any case it is necessary to spread one’s net widely both in time and 
space; nor is the smaUer area occupied a clear gain, for it is offset by the necessity 
for closer supervision; but it does make it possible to contract the limits of 
significance so that more series of experiments give definite answers to the 
questions asked. 

I have instanced the half-drill strip, but obviously any method of reducing 
the error is of advantage, whether it is by replication (including, for instance, 
multiple Latin squares), reduction of the size of plot, or regular balanced 

arrangement. ... 

The instances given above have been fairly simple, inasmuch as the differential 
response of barleys to variations of soil and climate is small; but even in these 
cases it would have been of advantage to have spread the net wider: the next 
experiment to which I am going to refer is of a more complicated nature, and is 
concerned with the response of sugar-beet to artificial manures. 

This has been described in the Rothamsted Report for 1934, and though I do 
not propose to try to add to the full analysis given therein, a short account of 
it may be instructive. 

The experiment was carried out in two seasons, at 13 stations in 1933 and 
15 in 1934; all combinations of three manures at three rates per acre were tried, 
and measurements of the weights of roots and tops, and of percentage of sugar 
and purity, were made, and various conclusions were drawn as to the effects of 
the manures. Among others, it appeared that some of these effects differed 
significantly at different farms. 

The next thing, clearly, is to connect up these differences with the character 
of the soil and weather at the various farms, but though mechanical and chemical 
analyses of the soil were carried out, there is no mention in the report of any 
attempt to do this. Presumably there was no marked connexion, and further 
results are awaited, for if “ 8 of the 15 centres gave significant increases in yield 
of roots with sulphate of ammonia, while the remaining 7 centres showed no 
appreciable increases”, the value of the result to the individual farmer will be 
much increased by some indication of whether his land is to be classed with the 
8 or the 7, I call attention to this in no spirit of criticism, but in order to bring 
out the full possibilities of co-operation on a still larger scale. 

Both Dr Beaven and the Rothamsted school have maintained that their 
methods can be carried out by the ordinary farmer; and if for ordinary you 
substitute exceptional, I agree; but the business, even of the exceptional farmer, 
is to farm, and he cannot afford the time to weigh up small experimental plots 
when he ought to be getting on with his work. 
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And so, while a co-operative series of experiments should always include a 
majority carried out on ordinary farms, there must he trained supervision and 
cultivation money, and this can only come from the Government, working through 
institutions like the National Institute of Agricultural Botany or Rothamsted. 

Furthermore, the more complicated the method, the more supervision is 
required, one man can just look after ten experiments with acre plots, with 
half-drill strips you probably want at least three, and for more complicated 
experiments even more; but farming is a large industry, and a gain, even a small 
gain, per acre on 100,000 acres soon pays for the cost of making experiments. 

APPENDIX 

The Error of Half-Drill Strip Experiments 

The half-drill strip technique has been criticized on the ground that no valid 
conclusion can be drawn from experiments carried out by it, and it may be well 
to examine what truth there is in the assertion. 

Essentially the method consists in sowing long narrow strips of two varieties 
of cereals in alternation. By an ingenious arrangement at sowing, these strips 
can be split longitudinally at harvest, and each half strip of one variety is com¬ 
pared with the half strip of the other adjacent to it; to balance the linear term 
of the fertility slope, the series begins and ends with a half strip of the same 
variety. The series is therefore of the form ABB A ABBA ... ABBA, and to 
calculate the error of the difference (A — B) a degree of freedom is allocated to the 
fertility slope. This is determined by the difference (S{AB) - S(BA)) i/n, where 
S(AB) is taken to be the sum of A - B for all the comparisons AB, S(BA) for 
all the comparisons BA and n is the number of pairs. 

Thus the analysis of the variance is given in a table of the form # 



Degrees of 
freedom 

-— -----—...—•■i—.— ...- „ n,,. 

Sum of squares 

Fertility slope 

1 

(^(Zb)-^(1a)) 2 i/w 

Random error 

n -2 

S(A - Bf - (£(ZS) - S(BA))*l/n 

Total j 

... i 

n -1 

S(A-Bf 


If, then, the variation in fertility consisted of random deviations superposed 
on a uniform fertility slope, the procedure would be beyond criticism; it remains 
to be seen how departures from such an ideal system invalidate the argument. 

The almost universal departure is that the fertility slope is not uniform, there 
are, ideally speaking, parabolic terms, so that the position AB represents a 
different advantage to A at different points in the series. This will have the effect 
of increasing the apparent error, since the sum of the squares of the differences, 
* [In this table it seems necessary to read £(^4 - Bf - n{A - Bf for S(A - Bf. Ed.] 
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8 (A — B) 2 , includes just as large a component due to the fertility slope, while 
the component calculated, (8(AB) — 8{BA)) 2 l/n, is smaller; this is because the 
sign of AB (and of BA) changes on passing from a falling to a rising part of the 
curve. On the other hand, there is a corresponding increase in the real error 
owing to the fertility slope not being accurately balanced, this error amounting 
at most to 2/n of the fertility slope between a pair for each change of direction. 

Furthermore, unless the fertility slope is of a periodic nature, a case to be 
considered later, the incidence of these changes of curvature will be random, 
so that the general tendency will be slightly to over-estimate the error, a fault 
on the right side for most of us, and one which is compensated by the smallness 
even of the apparent error. 

Periodic fertility slopes may undoubtedly occur, but apart from those due to 
the works of man, they must be so rare as to add a negligible risk; where, however, 
they are due to such causes as old ploughman’s “lands”, it should be possible 
to avoid them by inspection; even if they have been overlooked, the chance of 
their affecting the mean difference is small, for to do so the period must very 
nearly coincide with an odd multiple of the width of a whole strip; in general, 
it is the apparent error that would be increased. 

We may therefore conclude that there is a slight tendency for the error of a 
half-drill strip experiment to be over-estimated, so that somewhat fewer sig¬ 
nificant results are obtained than if the real error could be accurately determined; 
this is more than made up for by the smallness of the error itself as compared 
with that of most other arrangements. 

There remain two other criticisms; firstly, that the system of drilling is such 
that half the coulters of the drill are allocated to one of the varieties and the 
rest to the other; if, then, the coulters on one side are badly set or stopped up, 
the other may have a constant advantage. This, though a real possibility, and 
one to be guarded against by careful inspection, is not as serious as it sounds, 
at all events with barley; for barley automatically fills up gaps to such an 
extent that the alteration in yield by large changes in seeding rates is almost 
inappreciable, so that within wide limits of faulty seeding it is the area devoted 
to the variety which counts, and not the exact distribution of seed within it. 

The other criticism has more substance; by the half-drill strip method only 
two varieties are directly compared. This is just what is wanted where a standard 
variety or rate of manuring is to be compared with a competitor for the rank 
of standard; but if two or more varieties are to be compared with the standard, 
their inter-comparison is, of course, subject to a much greater error. 

Up to the present, the half-drill strip method has, as far as I know, only been 
used for cereals in these Islands and in New Zealand, but it should be equally 
useful for such manures as can be drilled, and a modification has even been 
suggested for a forest experiment. 
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COMPARISON BETWEEN BALANCED AND RANDOM 
ARRANGEMENTS OE FIELD PLOTS 

[ Biometrika , xxix (1938), p. 363] 

[The following editorial note was printed at the head of this paper: With very deep regret 
the Editorial Committee has to report the death, on 16 October 1937, of Mr W. S. Gosset, 
whose scientific contributions under the pseudonym of “Student” are well known to all 
statisticians. It is hoped to include some account of his life and work in the next issue of 
the Journal, 

Mr Gosset had been working at the following paper during the past summer, and a 
fortnight before his death had discussed the draft, which is printed below, with Dr J. 
Neyman and Prof. E. S. Pearson. It was then agreed that certain points in sections 2 and 3 
needed clarification and Mr Gosset proposed to undertake this work himself; unfortunately 
this final revision was never completed. Dr Neyman and Prof. Pearson have therefore 
added in a separate Note ( Biometrika , xxix (1938), pp. 380-88) some comments, for which 
they take full responsibility, regarding the points on which they know Mr Gosset had 
intended to enlarge.] 

In a paper read before the agricultural and industrial section of the Royal 
Statistical Society* I ventured to point out that the advantages of artificial 
randomization are usually offset by an increased error when compared with 
balanced arrangements. Prof. Fisher does not agree and has written a paper to 
test the difference of opinion that there is between us.f 

In this paper I propose to set out as clearly as I can just what is this difference 
of opinion. 

Next I propose to show that the conclusions of Prof. Fisher’s paper all follow 
firstly from his having made use of a method of calculating the error of the 
“systematic ” arrangements which I showed fourteen years ago would lead to just 
the misleading conclusions which he has found, and secondly to his not having 
compared like with like. 

Thirdly, I will show that if he had not fallen into these pitfalls he would have 
been able to show that in the case which he took, a balanced arrangement does in 
fact give a slightly smaller error than his randomized one. 

Fourthly, I will describe just what is to be expected when balanced arrange¬ 
ments are compared with random, J viz. that when the variance due to treatment 

* W. S. Gosset, “Co-operation in large-scale experiments”, Supplement to J. Roy . Statist. 
Soc . hi (1936), pp. 115-22, [20]. 

t Barbacki and Fisher, “A test of the supposed precision of systematic arrangements”, 
Ann. Eugen. vn (1936), pp. 189-93. 

| Note that an arrangement can be both balanced and random and where this is 
practicable the aims of Prof. Fisher and myself are both satisfied. 
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is low compared with the error of the experiment, fewer significant results are 
obtained than with random arrangements, but when the variance due to treat¬ 
ment is high more significant results are obtained with balanced arrangements. 

Lastly, I will give in an appendix the results of some testing of balanced versus 
random arrangements on uniformity trials by Mr A. W. Hudson of Massey 
College, N.Z. 

§ 1. The Effect of Lack of Randomness on Bias 

It is almost invariably necessary, when applying mathematics to practical 
affairs, to replace the actual conditions by a set of simpler approximations with 
which the mathematics are capable of dealing, and mathematical statistics are no 
exception to this rule. 

For example, the analysis of variance which is generally used to determine the 
error of agricultural experiments requires three assumptions to be made before 
we can apply the method strictly: 

(1) The systems concerned are to have normal variation. 

(2) The variances of like things should be equal. 

(3) The sampling should be random. 

(1) If, as is usual, the variation is not normal our argument will not be im¬ 
paired unless the number of replications is very small, when departure from 
normality introduces an added uncertainty to the estimation both of mean and 
perhaps even more of variance. 

(2) If, as often happens, the variances are not equal, as for example when we 
are pooling the variances of the yields of barleys which react differently to soils 
of different fertility, we shall not in general invalidate our conclusions appreciably, 
though in extreme cases attention should be paid to this source of error. 

(3) If, however, the sampling be not random, there are such possibilities of 
drawing false conclusions that Prof. Fisher has introduced a system of artificial 
randomizing to ensure that the third condition is satisfied and brands all other 
systems invalid. 

Nevertheless, it is possible, by balancing sources of error which would otherwise 
lead to bias, to obtain arrangements of greater precison which are nevertheless 
effectively random, by which I mean that the departure from randomness is only 
liable to affect our conclusions to the same sort of extent as do departures from 
normality or inequality of variances. 

Lack of randomness can affect either the mean or the variance, and it is the 
first of these which is apt to lead to invalid conclusions. Thus Mr Yates has shown 
that it is practically impossible for anyone to select shoots of corn of average 
length by eye, and in fact none of the senses can be trusted to behave without 
bias. Those of taste or smell are peculiarly liable, and if comparisons are to be made 
it*is necessary to avoid giving the least inkling of the order in which the samples 
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are to be presented, in fact it is better to let it be known that it is a random order 
In some cases the only way of avoiding bias is to withhold all knowledge of the 
object of the investigation from those taking part, though unfortunately this 
engenders a lack of interest in tlie proceedings. 

Again, a promising experiment in nutrition was ruined by departure from 
randomness when the schoolmasters were allowed to adjust the supposed uneven 
effects of a chance selection of subjects for the Lanarkshire Milk Experiment, and 
m doing so managed to select, doubtless from the most humane motives, 10 000 
children to receive milk who were significantly lighter and shorter than the 
10,000 “controls” who did not. 

In agricultural experiments there are obvious possibilities of bias affecting 
the mean in badly arranged experiments, for it is usual to find “fertility slopes” 
in most “uniformity” experiments, i.e. when an apparently uniform field is 
harvested in small-sized plots it is usual to find that the yield is higher in some 
parts than in others and tends to change more or less gradually from one place to 
another. Hence if plots of one variety are sited, whether systematically or by 
chance, nearer to one end of the experimental area than to the other, the mean 
is likely to be biased. 

To take the simplest case of two varieties or treatments, the layouts 
ABABABAB (systematic) 
anc ^ ABAABABB (random) 

will both favour B if the field is more fertile on the right than on the left hand, the 
second rather more than the first. 

On the other hand the layout 

abbaabba 

is balanced with regard to a simple “linear” fertility slope, and the mean of 
neither A nor B will be biased except by departure from linearity. 

It is, of course, possible to imagine particular variations in soil fertility which 
will bias the means of plots arranged in this manner, but with one exception they 
are of the same nature and lead to the same sort of bias—but usually to a smaller 
extent—as occurs with artificially randomized layouts. 

The one exception is a periodic wave of fertility due to previous cultivations 
which happens to coincide in period with the width of an odd integral number of 

quartets, a not particularly likely occurrence. 

Such layouts as ABBA are termed balanced, and any number of treatments 
may be set in a balanced layout, as, for example, in the Latin square which is not 
only balanced but random as well, “thus conforming to all the principles of 
allowed witchcraft”. 

It is reasonable to expect that balanced layouts will on the whole be successful 
and that the mean will be less biased than in random, and this expectation is 
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illustrated by some experimental sampling carried out by Mr A. W. Hudson of 
Massey College, N.Z., who tested balanced and random blocks against one another 
on three different uniformity trials. His results are given in the Appendix, and all 
that need be said here is that in fifteen experiments the balanced layouts showed 
slightly more bias in three and less in twelve, the reduction of bias bemg very 
considerable in some of the twelve.* 

And this brings me to a question which has often interested me. Suppose there 
are two treatments to be randomized—I take two for simplicity only and 
suppose that by the luck of the draw they come to be arranged in a very unbalanced 
manner, say AAAABBBB: is it seriously contended that the risk should be 
accepted of spoiling the experiment owing to the bias which will affect the mean if 
there is the usual fertility slope? For, as will be shown later, not only will the 
mean be biased, but the apparent precision will tend to be high, and misleading 
conclusions drawn much more often than the 1 or 5 % of the tables. It is of course 
perfectly true that in the long run , taking all possible arrangements, exactly as 
many misleading conclusions will be drawn as are allowed for in the tables, and any¬ 
one prepared to spend a blameless life in repeating an experiment would doubtless 
confirm this; nevertheless it would be pedantic to continue with an arrangement 
of plots known beforehand to be likely to lead to a misleading conclusion. 

Let us suppose therefore—as indeed it is rumoured—that common sense 
prevails and chance is invoked a second time and that such an arrangement as 
BBABBAAA is offered; is this to be accepted? It is more likely to give a biased 
mean than BABABABA, but then of course it is random! 

And if this is not to be used, how about BBABABAA1 In short, there is a 
dilemma—either you must occasionally make experiments which you know 
beforehand are likely to give misleading results or you must give up the strict 
applicability of the tables; assuming the latter choice, why not avoid as many 
misleading results as possible by balancing the arrangements? And this, to do 
Prof. Fisher justice, is the direction towards which he is tending; in his paper with 
Dr Barbacki he treats for the first time of “randomized sandwiches” to which 
the objection is, not an appreciable increase of error, but the practical difficulty 
of working them. 

To sum up, lack of randomness may be a source of serious blunders to careless 
or ignorant experimenters, but when, as is usual, there is a fertility slope, balanced 
arrangements tend to give mean values of higher precision compared with artificial 
arrangements. 

Next, what is the effect of lack of randomness on the variance? 

In a later section I will show that since in the “null” case, i.e. when no real 
treatment differences exist, the aggregate variance due to “treatments” and 

* Mr Borden, of Hawaiian Sugar Planters’ Association, Hawaii, has obtained similar 
results in similar experiments, and I have no doubt that this will always tend to happen. 
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residual error is constant for all arrangements of treatments in the blocks, those 
with low actual error necessarily give high calculated values for the error and vice 
versa, the calculated error, however, varying much less than the actual in ordinary 
experiments owing to the larger number of degrees of freedom of the residual error. 

This, of course, has nothing to do with the origin of the experiment whether 
randomized or not. 

If, however, the arrangement is *' 4 randomized” one can —before the draw — 
state accurately, subject to normality, etc., what the chance of getting any 
particular partition of variance between “treatment” and “residual error” will 
be in the “null” case. After the draw, when one particular arrangement has been 
chosen, it is often possible to be sure that the chance has changed in one direction 
or another without, however, being able to define exactly what it is. # In parti¬ 
cular, balanced arrangements tend to have lower actual errors and higher calcu¬ 
lated errors than would be expected by chance before a random selection is made, 
and this is so even if a degree of freedom is allocated to fertility slope, owing to 
the departure of the “slope” from linearity. 

The consequence is that balanced arrangements more often fail to describe 
small departures from the “null” hypothesis as significant than do random, 
though they make up for this by ascribing significance more often when the 
differences are large. 

Thus such departures from the “null” hypothesis as are found to be significant 
by balanced are likely to be larger than those found by randomized arrangements, 
and in particular those discovered in the “null ” case itself—5 or 1 % as the case 
may be—tend to disappear altogether with balanced arrangements. 

It will be seen then that the difference between Prof. Fisher and myself is not 
a matter of mathematics-heaven forbid—but of opinion. He holds that balanced 
arrangements may or may not lead to biased means according to the lie of the 
ground, but that in any case the value obtained for the error is so misleading that 
conclusions drawn are not valid, while I maintain that these arrangements tend 
to reduce the bias due to soil heterogeneity and that so far from the conclusions not 
being valid they are actually less likely to be erroneous than those drawn from 
artificially randomized arrangements. Further, that in the really important 
agricultural experiments which are carried out at more than one centre—and it 
was of these that I was speaking—the very slight disadvantage that an occasional 
result at an individual station may not be recognized as significant owing to 
over-estimation of the error at that station is more than offset by the greater 
precision of the experiment as a whole. 

* This is analogous to the use of a life table to give the expectation of life. Thus the 
expectation of life of an Englishman of 40 can be referred to an appropriate table, but 
when we particularize the Englishman of 40 as a tin-miner or an agricultural labourer we 
know that the expectation is lower or higher than that given in the table without perhaps 
knowing very exactly by how much. 
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§ 2. Barbacki and Fisher 

Such being our opinions, based in each case on a priori argument, Prof. Fisher 
rightly decided to put the matter to the test by assigning imaginary treatments to 
plots of which the yield had been determined in a uniformity experiment both on 
a random and on a balanced system, and published a paper, # of which he gives 
the following summary: 

“1. This inquiry was carried out to test the truth of the opinion expressed by 
* Student’ that randomization achieves its object ‘usually at the expense of increasing 
the variability when compared with balanced arrangements’, and that one of the 
means available to experimenters of reducing the error is by adopting a ‘regular 
balanced arrangement’. 

“2. Using an extensive uniformity test it is found that the arrangements ran¬ 
domizing either pairs or sandwiches of half-drill strips give smaller errors than the 
systematic arrangement advocated as more precise. 

“3. As a consequence experimenters using the systematic arrangements syste¬ 
matically underestimate their errors. 

“4. The error estimated from a systematic arrangement is ambiguous, and the 
experimenter has an arbitrary choice between several widely different estimates. 

“5. Owing to the failure to furnish a valid estimate of error, ‘Student’s’ test of 
significance is not approximately correct for systematic arrangements.” 

The particular arrangement which Prof. Fisher intended to test was the Half- 
Drill Stripf introduced by Dr Beaven some fourteen years ago and widely used 
since then, but unfortunately half-drill strips are too large to lend themselves 
easily to testing on ordinary uniformity trials, and although Prof. Fisher has 
laid out eight pairs of half-drill strips on his uniformity trials he has not in fact 
compared them with a corresponding random arrangement but has cut them up 
transversely into 5-yard lengths and has compared the actual error of the large 
half-drill strips with that calculated from the randomized^ sheaf weights of which 
they are composed. 

Now it happens that Dr Beaven had originally proposed to calculate the error 
of the half-drill strip from sheaf weights of this kind, and that I pointed out in 
this Journal thirteen years ago § that since such “ sheaf weights ” may be positively 
correlated such a method of calculating the error is fallacious. 

* Barbacki and Fisher, “A test of the supposed precision of systematic arrangements”, 
Ann . Eugen. vn, pp. 189-93. 

f Prof. Fisher prefers to call this the “Split Drill” Method, but though I agree that the 
name is more descriptive it is a pity to confuse the matter by a change of name after all 
these years. More particularly is it confusing to transfer the name “Half-Drill Strip” to 
small portions of the original half-drill strip as he has done, and I have called them by 
Dr Beaven’s name of “Sheaf Weights”. 

J Not very much randomized; he compares corresponding pairs just as anyone else would. 

§ “On testing varieties of cereals”, Biometrika , xv (1923), pp. 271-93, [11]. 
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This method of calculating the error has, of course, nothing to do with 
balanced arrangements, except that it was proposed by Dr Beaven, the author 
of the half-drill strip; it might just as well be applied to random arrangements, as, 
for example, the “randomized pairs” of Prof. Fisher’s experiment, each of which 
was actually harvested in six separate drills from which the error could have been 
equally erroneously calculated. 

Prof. Fisher has therefore calculated the error of the half-drill strip by a method 
which I showed thirteen years ago would be likely to give a fallaciously low value, 
and quite rightly has not used this method to calculate the error of his “randomized 
pairs”: it is entirely due to this that he can draw conclusion (2) of his summary. 

From this single fallacious conclusion he boldly generalizes to reach conclusion 
(3) which, as was shown by 0. Tedin whom he quotes, is directly at variance with 
the facts. Conclusion (5) also follows solely from Prof. Fisher’s faulty method and 
not from the balanced arrangement. 

When the paper appeared I wrote a letter to Nature * pointing this out, and that 
the actual error of the half-drill strip aggregate was in good conformity with that 
calculated from the weights of the whole strips. 

In answering me Prof. Fisher replied that in that case the error of the 
“randomized sheaf weights” was so much smaller than that of half-drill strips 
that eleven times the area would have to be used to reduce the error of half-drill 
strips to that of “randomized sheaf weights” and further repeating his con¬ 
clusion (4) with w r hich I shall deal later. 

Now one of the things that was noticed when uniformity trials first began was 
that the same piece of land laid out in large plots gave a very much larger error 
than if subdivided into small plots, and since half-drill strips were in this trial 
twelve times as large as “sheaf weights”, Prof. Fisher’s conclusion naturally 
follows since he is not comparing like with like. 

Yet even so, those who have actually had to carry out agricultural experiments 
might very well prefer to work eleven times the area with ordinary agricultural 
methods and tools than have to sow and harvest 192 “randomized sheaf weights ”, 
if indeed that could be done at all under ordinary weather conditions. 

Nevertheless, it is a fact that the error of this particular set of half-drill strips 
is unusually large. This arises partly because the number of repetitions is low but 
chiefly from the fact that the uniformity trial which Prof. Fisher chose to illustrate 
his argument showed a rather unusual feature due to faulty technique. 

An examination of the original drills which were condensed to form the half¬ 
drill strips shows a periodicity, the averages of each eighth drill being for fifteen 
repetitions. 673g 7200 7839 67g5 6689 7478 68g7 66g7 

These variations are obviously not due to chance (for instance, the third drill 

# [See pp. 218-19 below. Ed.] 
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gave the highest yield in twelve of the sets of eight and second highest in the other 
three) and are doubtless connected with some defect in the seed drill, probably the 
tines were not evenly spaced, and this could possibly have been detected had it 
occurred to Mr Wiebe to examine the working of the drill before sowing. 

The result is that since six of the eight drills were added up to form a “ half-drill 
strip”, then one drill omitted, and then another six, and so on, there was a 
periodic variation in fertility not coinciding in period with the width of the half¬ 
drill strip, and this, as I pointed out in the Appendix to my Royal Statistical 
Society paper, increases the calculated error but does not bias the mean. 

For the same reason the correlation between the corresponding sheaf weights 
is very much higher than would usually be the case and full scope is thereby given 
to Prof. Fisher’s faulty method of calculating the error. 

Let us now deal with Prof. Fisher’s fourth conclusion: “The error estimated 
from a systematic arrangement is ambiguous and the experimenter has an 
arbitrary choice between several widely different estimates.” 

We may observe in passing that this is another instance of Prof. Fisher s 
passion for generalizing on somewhat narrow foundations, for the possibility 
which he refers to is peculiar to the half-drill strip arrangement. 

In the half-drill strip, however, it is possible either to calculate the error from 
such aggregates as ABBA which I termed sandwiches in my paper to this 
Journal or from the separate parts of such aggregates, AB and BA, termed 
“pairs” by Prof. Fisher. 

Of these the former is clearly the better if only there is a sufficient number of 
replications to give a good estimate of the error. As this is unusual it is generally 
best to give a degree of freedom to the fertility slope and calculate the error from 
“pairs”. 

Admittedly this tends to overestimate the error with the sort of results 
obtained in § 4. Faced with this choice, I personally choose the method which is 
most likely to be profitable when designing the experiment rather than use 
Prof. Fisher’s system of a posteriori choice* which has always seemed to me to 
savour rather too much of “heads I win, tails you lose”. 

§ 3. A Properly Balanced Arrangement 

It appears then that Prof. Fisher’s paper is altogether irrelevant to the question 
at issue, but in order that Dr Barbacki’s work may not be wholly wasted we can 
make a calculation of the error of a properly balanced arrangement of plots of the 
same size as the “randomized sandwiches” of which he has calculated the error. 

For it will be noticed that Prof. Fisher’s “systematic” arrangement, though 
“balanced” as “half-drill strips”, is not so when regarded as a number of “sheaf 
weights”: lateral balance is necessary. 

* Statistical Methods for Research Workers, § 24.1 (5th ed.), p. 125. 
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The obvious layout is therefore to have the ABBA arrangement in both 
directions. 

Thus: 

ABBAABBAAB AAAAAAAA 

baabbaabba bbbbbbbb 

BAABBAABBA etc. instead of : B B B B B B B B etc 

A BB A A B B A A B AAAAAAAA ' 

't B l B A J l ABBAAB AAAAAAAA 

BAABBAABBA BBBBBBBB 

etc ‘ etc. 

This is merely a chessboard with fringes, each square being divided at harvest 
into four. The squares” should be long and narrow, to gain the advantage of 
contiguity and the comparisons should be made between adjacent long subplots 
of the different varieties. I have not seen this rather obvious arrangement 
mentioned before; it is admittedly no more suited for agricultural work than 
randomized sandwiches ”, but it might be used in horticultural work, where the 
reduced borders ” would be of advantage, or for pot culture. 

In this case we can start from Dr Fisher’s Table II by reversing the signs of 

columns (u), (m), (vi), (vii), (x) and (xi) and calculate the error from an analysis 
of variance as follows:* J 


Variance due to 

Degrees of 
freedom 

Sum of squares 
of “split drill” 
differences 

Longitudinal fertility slopes 
Lateral fertility slopes 

Varietal difference 

Residual errors 

12 

8 

1 

75 

887,171 

4,508,506 

2,741 

3,988,681 

Total 

96 

9,387,099 


The difference between A and B is thus 513g. and the s.n. of this difference 
59, as compared with 2353 calculated from “random sandwiches”. 

Thus, as we should expect, the difference is comfortably within the s.d. and 
the s.d a little below that calculated from “randomized sandwiches”, itself a 
partially balanced arrangement though random. 

We see then that if a properly balanced arrangement is put down on the 
uniformity experiment of Dr Fisher’s choice the error is found to be, as usual, less 

an his random arrangement, though not by much since “sandwiches” are 
themselves balanced. 

‘‘4,.^+-^ ° f ^ ^ n f lysi ® does not seem clear. It was a point on which 

Student had promised to enlarge before the final presentation of the paper- Z> the 
editorial note on p. 199 at the head of this article. Ed ] P ^ ’ 


_ - __ 
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§4. The Effect of “Balancing” on the “Validity” 
of Conclusions 

From a priori considerations—and Mr Hudson’s and Mr Borden’s experiments 

are in accordance with this expectation—it seems fairly certain (i) that “balancing” 

has no tendency to bias the mean, and (ii) that when there is a “fertility slope”— 
or anything corresponding to it, e.g. a time effect—the result will be to increase 
the apparent error but to decrease the real error. What effect has this on the 
“validity” of conclusions drawn from balanced experiments? 


(i) The case of blocks, randomized or balanced, fudged by the z test 

Let us take the case of four treatments in six blocks giving fifteen degrees of 
freedom to the residual error and three for treatments, and let us suppose the 
arrangement put down on a uniformity trial. 

Then, once the plots and blocks are marked out, the “total sum of squares” 
and the’ “sum of squares due to blocks” are fixed; the difference between these 
represents in all cases the eighteen degrees of freedom due to treatments and 
residual error, but will be divided between the two in different proportions 
according to the chosen arrangement of the treatments in the blocks. If the 
arrangement is random the frequency of any particular ratio is known to follow 
the z distribution, and owing to the skewness of this there will more often than 
not be a lower variance of the treatments with three degrees of freedom than of 
the residuals with fifteen. 

If the arrangement is not random the frequencies will not follow the s 
distribution, e.g. with regular unbalanced arrangements the variance “due to 
treatment” will tend to be high compared with that of “residual error”, while 
with regular balanced arrangements the reverse is the case. It will therefore be of 
interest to see what happens when a real “variance due to treatment ” is imposed 
on uniformity trials which give ratios at different points of the z scale. 

Thus it may be convenient to take as norm those uniformity trials which have 
the same variance for “ means of treatments ” as that calculated from the residuals 
and let this variance be <r 2 . Then another set of trials may be considered of which 
the means have a variance of 0-5o' 2 and consequently a variance of “residual 
error” of l-ltr 2 , since 15 x 1-1 + 3 x 0-5 = 18. This set may be taken to represent 
the tendency of balanced arrangements to produce low variance “due to treat¬ 
ment”. A third set representing “unbalanced” arrangements may be taken with 
a means variance l-5cr 2 and a variance calculated from residuals of 0-9cr 2 . 

All three of these occur, of course, in their proper proportions in random trials 
and are none of them uncommon. They are merely taken here as types. 
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In what follows I shall for convenience term the variance of means the actual 
variance of error, erf, and the variance calculated from residuals the calculated 
variance of error. 

Now suppose that a real variance due to treatment—measured without error, 
be superposed upon the uniformity experiment. Then the calculated variance 
of error will he unaffected and the observed variance due to treatments will be 
o $ % + o‘l + Zr eT o' T (? e and, since T and e are independent, the distribution of the 
observed variance can be calculated from the known distribution of r when there 
is no correlation, which in this case of four treatments is uniform between +1 
and — 1. 

From this we can determine the probability that any given <r %, superposed on 
any particular arrangement, will be deemed “significant” when compared with 
the corresponding “calculated variance of error”. 

The results of such calculations are given in the following table, which gives 
the probability of exceeding the 5 % limit of significance, or if preferred can be 
read as the percentages of “significant” results. 


Value of 

Probability of obtaining significant result 

Actual variance of error 

1*5<t 2 

LOO - 2 

0*5cr 2 

Limit of significance 

2*96cr 2 

3*29cr 2 

3*63cr 2 

0-5 

0*22 

0 

0 

1-0 

0*41 

0*18 

0 

1-5 

0*51 

0*34 

0*03 

2-0 

0*58 

0*45 

0*22 

2*5 

0*63 

0*53 

0*36 

3*0 

0*68 

0*60 

0*48 

3-5 

0*72 

0*66 

0*57 

4*0 

0*76 

0*71 

0*66 

4*5 

0*79 

0*76 

0*73 

5*0 

0*82 

0*80 

0*80 

5*5 

0*85 

0*84 

0*86 

6*0 

0*88 

0*88 

0*92 

6*5 

0*90 

0*91 

0*97 

7*0 

0*93 

0*94 

1*00 

7*5 

0*95 

0*98 


8*0 

0*97 

1*00 

_ 

8*5 

0*99 

_ 

_ 

9 -° 

1*00 

— 

— 


This table illustrates the fact that arrangements which give an actual error 
less than the calculated fail to give as many “significant” results as those which 
give larger actual errors up to a real treatment variance of about five times the 
average residual variance, at which point about 20 % of the experiments still fail 
to show significance in each case. When the real treatment variance rises above 
this point, the smaller the actual error the more are the significant results. 

BPS 


14 
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It is perhaps rather invidious to decide below what value of the real treatment 
variance “significant” results are misleading, but in any case it is clear that the 
fault of the arrangements with low actual variance is not lack of validity. On the 
contrary, conclusions drawn from experiments giving significant results by such 
arrangements are more valid in the ordinary sense of that word. 

These arrangements have so far been considered as having arisen in a random 
manner, but by using balanced arrangements the proportion of arrangements 
having actual low errors is increased, and hence conclusions arrived at from 
balanced arrangements are more, not less, valid. 

Nevertheless, it is clear that if it is required to calculate the error from an 
experiment carried out at a single station it is advisable not only to balance the 
experiment but to allow for the error eliminated by allocating a degree of freedom 
to the fertility slope. Even so it is likely that the actual error will be less than 
the calculated and the conclusions more valid than they appear to be. 


(ii) The case of half-drill strips judged by the t test 

I showed in the Appendix to my paper on Co-operative Experiments that .it is 
usually advantageous to allot one degree of freedom to the fertility slope, and 
that since fertility slopes are not usually strictly linear there is a tendency for 
the calculated error to be larger than the actual error. Let us illustrate this in the 
case of experiments carried out on the scale adopted by the N.I.A.B., namely, 
with ten pairs of comparisons; this is of course rather a small scale, and of the nine 
degrees of freedom one is allocated to the fertility slope and eight to the residual 
error of comparing the two varieties. 

In this case we are to vary, not the position of treatments on a given piece of 
ground, but the pieces of ground on which a half-drill strip of ten pairs is set and 
the “norm” which we shall take is the case where, owing to a particular uniform 
fertility slope, the calculated and the actual error exactly correspond with the 
standard error cr. 

With this we can compare a case where the variance of actual error is 0-5cr 2 and 

the calculated error therefore ^1 + 

and l-03cr. A tendency in this direction is, as noted above, common, since fertility 
slopes are naturally not uniform; on the other hand, when the fertility slope is 
small, random sampling may give us a case where the actual error is larger than 
the calculated, let us say standard errors of l*22cr and 0*97cr. 

Then in the three cases we find from the t table that the 5 % significance point 
is for the “norm” 2*30cr, for the low actual error 2-37o', and for the high actual 
error 2-23cr, while the actual errors are distributed normally with s.e.’s. cr, 0*707cr 
and L-22cr and the percentage of “significant” results, i.e. those above the 


!~jcr 2 = l*062cr 2 , i.e. standard errors 0*71cr 
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significant point calculated above, can be readily determined for values of the 
realji.e. measured without error) differences between the two “varieties”, say 

These are given in the following table. 


Variance of calculated error 
Variance of actual error 
s.e. calculated 
s.e. actual 
Limit of significance 


Value of 


A-B 


0 

0-5 

1-0 

1- 5 

2 - 0 

2- 5 
30 

3- 5 

4- 0 

4- 5 

5- 0 
5-5 


0-94cr 2 

1-Ocr 2 

l-06cr 2 

l-5cr 2 

LOcr 2 

O-50- 2 

0-97<t 

1-Ocr 

1-03O- 

l*22cr 

LOcr 

0-7070- 

2-23er 

2-30O- 

2-370- 

Probability of significant results 

0-07 

002 

0 

0-01 0-08 

0-04 

0 

0-16 

0-10 

0-03 

0-27 

0-21 

0-11 

0-42 

0-38 

0-30 

0-59 

0-58 

0-58 

0-74 

0-76 

0-81 

0*85 

0-88 

0-95 

0-93 

0-96 

0-99 

0-97 

0*99 

LOO 

0-99 

LOO 


1-00 

— 

— 


It will be noticed that in the left-hand column there are two probabilities 
given opposite 0-5, 0-01 that a negative significant result and 0-08 that a positive 
significant result will be obtained. Fortunately such a case is almost impossible 
unless of course ‘randomized pairs ” were used instead of a half-drill strip What 
we are concerned with in practice is something which tends towards the right- 
hand column which, as in the case of the balanced blocks, errs by fading to give 
significant results when the difference to be measured is small, but from a value 
oi about 2-55—at which all produce significant results in 60 % of trials—gives a 
higher percentage than when the calculated and actual errors are equal. 

It is clear, therefore, that in this case too, conclusions drawn from a balanced 

mndom ment ““ ^ ^ m0 " VaM than if the arran g e »ient had been 

The above tables rather emphasize the well-known paradox that it is just when 
the experimenter is congratulating himself on the unusual smallness of his 
experimental error—-unusual, that is, for the type of experiment and number of 
replications—that he is most likely to be betrayed into drawing false conclusions: 
for the small calculated error indicates a large actual error, and this whether the 

arrangement be random or balanced, though it is likely to occur more frequently 
m the random. H J 

In conclusion, I should like to emphasize the fact that when using the phrase 

cn lcize y ro . Fisher I was concerned with co-operative experiments carried 
out at a number of different places. 


14*2 
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Such experiments, as indeed all agricultural experiments, are only of value in 
so far as the venue is representative of the conditions under which the results of 
the experiment are to be applied, and so the result at any single station is not of 
any particular importance in itself but only in its interaction with the results 
obtained at the other stations, for only so can its representative nature be 


established. „ 

To take a simple case a variety trial may indicate that one wheat will do better 

than another in heavy but not in light soils; such a conclusion is more likely to 
follow from an experiment carried out with a low real error and a corresponding y 
high calculated error at the individual stations than if a low calculated error gave 

“significant” results sporadically. _ 

It is therefore important that the results should be determined with as little 
real error as possible, and the calculated error at each station is superseded by the 
error of the experiment as a whole. 


APPENDIX GIVING MR A. W. HUDSON’S COMPARISONS OF RANDOM 
AND REGULAR ARRANGEMENTS IN UNIFORMITY TRIALS 

Mr Hudson’s account of his procedure is as follows: 

“(i) Four, five or six imaginary treatments were allocated according to whic 
was the most suitable to the full utilization of the data. 

“(ii) These were allocated to blocks in a regular-balanced fashion and then 
to the same blocks randomwise, using various numbers of ‘units’ per individual 

“ The regular arrangements were balanced by using two or four series in which 
the treatments in the second and fourth series were in opposite order to those in the 
first and third, thus: ^ 2} 3 , 4, 1, 2, 3, 4, etc. 

4 , 3, 2, 1, 4, 3, 2, 1, etc. 

2, 1, 4, 3, 2, 1, 4, 3, etc. 

3, 4, 1, 2, 3, 4, 1, 2, etc. 

or alternatively, where the shape of the individual plot permitted, only a single 
series, thus: 

etc., 2, 1, 4, 3, 2, 1 Middle 1, 2, 3, 4, 1, 2, 3, etc.” 

Mr Hudson’s experimental work must not be taken as an attempt at a proof 
that balanced arrangements are likely to give a lower error than random un¬ 
balanced arrangements; that seems to me obvious, and it is for those who wish to 
disprove the obvious to obtain evidence in support of their eccentric opinions, but 
it does give an interesting illustration of what is likely to happen m practice, and I 
print it in the hope that it will help to clarify other people’s ideas as it has mine. 
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Table I 

Data from Journal of Agricultural Science , Vol. iv. Part 2, 1911 
Mercer and Hall. Mangold Plots 

Number of rows ... 20) m 

Units per row ... 10/ r ° tal number of units 200, but only 160 used in first three. 


B./Tr. 


20/4 


10/4 


10/4 


8/5 


4/5 


B. xU. 


1 x 2 


2x2 


1x4 


G.M. 


Kandom 


656*4 


1312-8 


1x5 


1312-8 


Calcu¬ 

lated 

S.E. 


6-63 


Dev. of 
T.M. from 
G.M. 


14-16 


16-40 


1642-9 


2x5 


3285-7 


- 3-3 

- 5-7 
+ 1-6 
+ 7-5 

+ 10*2 

- 4-1 
- 12-2 
+ 6-3 


Actual 

S.E. 


Balanced 


Calcu¬ 

lated 

S.E. 


5-84 


10-15 


21-62 


50-78 


+ 12-7 
-18-0 
+ 7-5 
- 2-0 


6-73 


14-42 


Dev, of 
T.M. from 
G.M. 


+ 4-4 
- 1-0 

- 1*7 

- 1-7 


- 0-8 
+ 8*3 

- 1-9 

- 5-4 


13-48 


-32*8 
- 20-0 
+ 27-8 
+ 4-3 
+ 20-5 


+ 55-0 
-25*0 
- 4-2 
+ 43-0 
-68-7 


25-9 


16-61 


22-92 


+ 16*3 

- 5-8 

- 6-8 
- 3-5 


50-6 


54*62 


+ 15*0 
-16*2 
- 5-5 
+ 19-7 
-13-2 


+ 15-3 
-15-7 
-53*2 
+ 9*3 
+ 44-5 


Actual 

S.E. 


2*95 


5*54 


10-92 


16-4 


36-7 


Table headings: B./Tr Blocks (replications) and treatments. 

C. bize of plot, rows x units. 

G.M. General mean of all plots. 

Dey U o7T d M'femV f M me L S . of * reat “ ents by analysis of variance. 

Acbiial^s^^i.e^Tlcu'h^'ed^from^previons e cohirnn. mean8 ^ « “ 
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Table II 

Data from Journal of Agricultural Research, Vol. xliv, No. 8, April 1932 
F. R . Immer. Yields of sugar beet 
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Table III 

Data from Journal of Agricultural Science, VoL xxii, Part 2, April 1932 

Kalanhar. Potatoes' 


Number of rows ... 96 Units per row ... 6 Total number of units ... 576 


B./Tr. 

R.xU. 

G.M. 

Random A 

Random B 

Balanced 

Calc. 

S.E. 

Dev. of 
T.M. 
from 
G.M. 

Actual 

S.E. 

Calc. 

S.E. 

Dev. of 
T.M. 
from 
G.M. 

Actual 

S.E. 

Calc. 

S.E. 

Dev. of 
T.M. 
from 
G.M. 

Actual 

S.E. 

32/6 

1 x 3 

69-8 

0-74 

- 0-2 
+ 0-4 
- 0-1 
- 0-6 
- 0-3 
+ 0-8 

0-51 

0-74 

- 0-6 
- 0-1 
- 0-3 
+ 0-2 
+ 0-6 
+ 0-3 

0*44 

0*74 

- 0-5 
+ 0-3 
+ 0-9 

- 0-1 
+ 0-3 
- 1-0 

0-67 

16/6 

1 x 6 

139-6 

1-52 

- 1-9 

0 

+ 2-7 

- 1-9 
+ 0-7 
+ 0*4 

1-74 

1-49 

+ 1-1 
+ 1-8 

- 3-1 
+ 1-2 
+ 1-1 

- 2*1 

2-05 

1-55 

- 1*4 
+ 1-3 
+ 0-9 
+ 0-7 

0 

- 1-5 

1-20 

16/6 

2x3 

139-6 

2-16 

+ 0-2 
- 1-1 

- 2-9 

- 0*1 
+ 0*4 
+ 3-5 

2-10 

2-19 

- 2-8 
+ 1-5 
+ 0-8 
+ 0-2 
+ 0-1 
+ 0-2 

1-52 

2-20 

+ 0-2 

- 0-3 

- 2-1 
- 0-5 
+ 2-0 
+ 0-8 

1-38 

8/6 

2 x 6 

279-2 

5-35 

+ 8-8 
+ 0-9 

- 5-4 

- 1-4 

- 2-8 
- 0-1 

4-84 

5-47 

+ 0-8 
+ 6-0 
+ 2-1 

- 3-8 

- 1-1 
- 4-1 

3-83 

5-56 

+ 3-0 
+ 3-3 
- 2-0 

- 3-3 

0 

- 1-0 

2*68 

8/6 

4x3 : 

279-2 

5-67 

+ 1-3 
+ 6-1 
+ 1-1 

- 3-9 

- 3-5 

- 1-1 

3-71 

5*58 

+ 8-0 

- 1-3 

- 3-8 

- 0-1 
_ 4.4 
+ 1-6 

4-52 

5-60 

: + 2-4 

- 4-4 

- 2-2 
- 2*7 

! - 0-9 
: + 7-7 

4*42 

4/6 

8x3 ; 

558-4 

33-3 

i 

- 12-1 
- 7-2 
+ 41-0 
+ 35-8 
-19-0 
-38-7 

31-7 

35-85 

- 5-9 
+ 39-0 
+ 10-8 
-16-9 
-12-5 
-14-6 

21-6 

37-76 

- 3-3 
- 10-8 
+ 5-6 

- 0-6 
+ M 
+ 7-8 

6-7 
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MISCELLANEOUS CONTRIBUTIONS 


A. LETTERS TO NATURE 
(i) Agricultural Field Experiments 
[Nature, cxxvi (29 November 1930), p. 843] 

In the article with the above title which appears in Nature of 25 October last, 
p. 667, it is stated: 

“Beaven’s half-drill strip method is described, but without pointing out its 
two serious but remediable defects: that the continued use of one half of the 

t 

drill for one variety, and of the other half for the variety with which it is to be 
compared, may introduce a constant difference the magnitude of which cannot 
be estimated; and that the regular alternation of strips of the two varieties does 
not permit of a valid estimate of experimental error.” 

I submit that these defects are more theoretical than practical, and that any 
modification of practice in the application of the method, such as changing over 
seed boxes, would be a retrograde step. 

To take the first, there are three possible ways in which one half of a drill 
may differ from the other: 

(1) It may cover a wider breadth of ground; this would doubtless have an 
appreciable effect, but it would be detected and allowed for by the routine 
measurements taken across the stubble. 

(2) The coulters may be less evenly spaced than those of the other, and 

(3) Less seed may be drilled from it than from the other. 

Now, cereal crops are wonderfully independent of the amount of seed sown. 
I have in mind two chessboard experiments, in one of which half the area was 
sown with seed 1 in. apart instead of the usual 2 in., and in the other, the rows 
in half the experiment were 3 in. apart instead of 6 in. In each case the heavier 
seeding only resulted in a gain of about 3 %, and it is not to be expected that 
such slight irregularities as occur between the two halves of a drill would have 
any measurable effect. 

The second defect, owing to the peculiar shape of the half-drill strip, would 
only exist if the experiment were to be sited so that some periodic variation 
existed across the breadth of the drills: otherwise randomness is supplied by the 
soil. By taking care that the experiment is drilled across ploughman’s “lands” 
if they exist, and by bearing in mind the history of the last few crops, this danger 
can be avoided. 
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The pairs of strips fall naturally into two sets according as one or other variety 
is on the right hand, and in an analysis of the variance of the difference between 
varieties, one degree of freedom is taken up by these two sets. The estimate of 
the experimental error arrived at in this way is perfectly valid, provided the 
above precautions have been taken in siting the experiment. 

It would be a pity to interfere unnecessarily with the simplicity of this very 
efficient method of conducting field trials. 

(ii) Agricultural Field Experiments 
[Nature, cxxvn (14 March 1931), p. 404] 

Mr Howard’s letter in Nature of 31 January last (p. 166) gives interesting con¬ 
firmation of the reviewer’s opinion in Nature of 29 November, 1930, p. 843, that 
depth of sowing influences the yield of wheat, yet I venture to suggest that such 
an extreme case as he quotes scarcely bears upon the point at issue. When seeds 
do not germinate, it is equivalent to a light seeding rate, which, as I pointed 
out, makes wonderfully little effect on the yield. Whether such differences as 
one may expect to occur between the depths of coulters in the same drill make 
any appreciable effect on the yields of the different rows is still, I think, an open 
question, and I suggest that the differences which the reviewer has observed 
between the yields of his rows may have been due to their being unevenly 
spaced. The yield which is comparatively unaffected by seeding rate, is that per 
areal and not that per linear unit. The reviewer quotes “an apparently uniform 
field” at Aarslev as upsetting my view that for practical purposes randomness 
can be obtained from the half-drill strip “provided care is taken to drill across 
ploughman’s 'lands’”, if they exist; yet Dr Sanders in his account of that 
experiment makes no mention of an “apparently uniform field” (/. Agric. Sci. 
xx, p. 65), but writes, “This oscillation apparently arose as a legacy of the old 
practice of ploughing in high ridges”, and so on. 

Even if the unsuitability of the field had been overlooked, the Aarslev plots 
were probably a good deal wider than drill width, and half-drill strips would 
have been extremely unlikely to coincide both in breadth and phase with the 
periodicity in question, while any partial coincidence would have betrayed the 
existence of the snare. 

Finally, there is a fallacy in Mr Howard’s last sentence—“It is obvious m 
such questions that nothing can be gained by the application of formulae and 
figures to the results obtained by poor agriculture.” There is no question, of 
course, of connecting the half-drill strip method of experimenting with poor 
agriculture; its great merit lies in the fact that in its present form it is ordinary 
farming practice: if, however, that practice were poor agriculture, it would be 
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a mistake to carry out trials by methods conforming to better standards: field 
trials must be capable of being considered a random sample of the practice, not 
of the theory, of agriculture. 

This may seem a hard saying, but an example will make my meaning clear. 
After a long series of experiments the Irish Department of Agriculture decided 
to introduce Dr Hunter’s Spratt-Archer barley as being the best suited for the 
country. This was almost everywhere a great and outstanding success; yet in 
one district, which shall be nameless, the farmers refused to grow it, alleging that 
their own native race of barley was superior to it. After some time the Depart¬ 
ment, to demonstrate Spratt-Archer’s superiority, produced a single-line culture 
of the native barley and tested it against the Spratt-Archer in the district in 
question. To their surprise, they found the farmers were perfectly right: the 
native barley gave the higher yield. At the same time the reason became plain: 
the barley in question starts more quickly and is able to smother the weeds, 
which flourish in that not too well farmed land; Spratt-Archer, growing less 
strongly at first, is, however, the victim and not the conqueror of the weeds, 
and the original experiments, carried out on well-farmed land, were definitely 
misleading when their conclusions were applied elsewhere. 

Taught by experience, the Department is now engaged in breeding a barley 
to meet their conditions; and this barley, when obtained, will rightly be tested 
by “results obtained by poor agriculture”. 


(iii) The Half-Drill Strip System 
Agricultural Experiments 

[Nature, cxxxvm (5 December 1936), p. 971] 

Prof. R. A. Fisher and Dr Barbacki have recently published a paper 
in the Annals of Eugenics entitled “A test of the supposed precision of syste¬ 
matic arrangements”.* There is a good deal in the paper with which I am not 
in agreement and with which I hope to deal elsewhere, but a letter from a friend 
of mine in Australia, who has heard at second-hand that Fisher’s “results 
showed not only that the half-drill strip failed to give a valid estimate of error 
but was less accurate”, shows that it would be better not to let such rumours 
get a start, for they are quite unfounded. 

In the paper, the crop on a uniformly treated field was assigned to two 
imagined treatments A and B on a systematic plan in which eight strips of the 
width of a half drill were assigned to A, and eight to B, in the usual arrangement 
of an eight comparison half-drill strip experiment. Apart from the fact that one 

* Barbacki and Fisher, “A test of the supposed precision of systematic arrangements”, 
Ann. Bug. vn, Part 2 (1936). 
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should have at least ten comparisons—in Beaven’s original paper* there were 
26—the representation is a fair one. 

The authors, for the purpose of ascertaining the degree of precision which is 
obtainable from the systematic arrangement in question, have taken the weights 
of grain, not from the total area of each of the 16 strips, but from 12 sections 
of each strip, and have treated these 192 sections as if they were independent 
half-drill strips—in fact they have called them half-drill strips—and from the 96 
comparisons they have calculated a standard error to represent the precision 
which they suppose an advocate of systematic arrangements would attribute 
to the method. But, of course, the sections of a half-drill strip are not in fact 
independent, and in this case are markedly correlated„ so that the figure which 
they obtain is much too small to account for the observed difference between the 
A’s and the iTs—and they draw conclusions adverse to the systematic arrange¬ 
ment and not to their own method of calculation. 

The procedure adopted, of dividing up the long strips, is that which Dr 
Beaven* originally proposed in 1922, namely, weighing the sheaves off equal 
segments of his half-drill strips and calculating the error from these weights; 
but so early as 1923, I pointed outf that this method would probably give a 
fallaciously small value, and since then it has been customary to regard the 
whole length of the strip as the unit in the calculation. 

Had Prof. Fisher and Dr Barbacki calculated the error on that basis,{ they 
would have found a standard error of 2*37 % of the average yield, while the 
actual difference between the A 9 8 and B ’s amounts to T75 %; that is, the differ¬ 
ence between two things which should be the same within the error of random 
sampling is in fact no more than 0*75 times the standard error. 

The authors’ practical demonstration of the correctness of my a priori 
reasoning is, of course, very gratifying to me, but I must nevertheless insist 
that their paper has no bearing whatever on the error of present-day half-drill 
strip experiments. 

* Beaven, “ Trials of new varieties of cereals ”, J. Minist. Agric. xxix, Nos. 4 and 5 (1922). 

t “Student”, “On testing varieties of cereals”, Biometrika , xv (1923), pp. 286, 287, [11, 

p. 106]. 

$ “Student”, “Yield trials”, Baillere’s Encyc. Sci. Agric . n (1931), [15]." “ Co-operation 
in large-scale experiments”, J. Roy. Statist. Soc. Supplement (1936), [20]. 
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B. CONTRIBUTIONS TO DISCUSSIONS AT MEETINGS OF THE 
INDUSTRIAL AND AGRICULTURAL RESEARCH SECTION 
OF THE ROYAL STATISTICAL SOCIETY 


(i) 


[Supplement to J. Roy . Statist. Soc. (1934), i, p. 18] 

Mr Gosset said that Dr Pickard had given such a wide and comprehensive 
survey of the application of statistical methods to industry that, in spite of 
having had a considerable experience himself, there was practically nothing 
that he could add on the subject. He had started with the raw material in the 
field, and ended with the finished cloth, and in his own particular industry he 
had had similar problems from one end to the other. 

He would like to refer to the question raised at the end of the paper—the 
selection of the statistician for industry. In his firm, a man who had had some 
experience of the industry had been sent out and taught statistics. That had 
happened some time ago—in fact it was twenty-eight years since he had ridden 
across the Berkshire Downs on a bicycle to interview Prof. Karl Pearson in 1905. 
On the whole, this had been found to be a good method, and perhaps because 
they had been working at it for so long, they did not experience the difficulty 
of the horrible jargon referred to by Dr Pearson, and it did not appear to produce 
quite such terrors even among the senior members of the firm. They more or 
less understood, and if they did not understand they were quite polite about it. 

If a man were sent out from the industry and put to school again, he was apt 
to forget what he had learned, and it was most important that such people 
should be in constant touch with their Professors. As Dr Pearson had pointed 
out, one reason was that the mathematical tools which the Professors provided 
would hardly be exactly what were wanted unless they knew how they were 
to be used. 

Another point arose from the peculiar nature of statistics. It was impossible 
to apply statistical methods to industry or anything else unless one had a certain 
amount of intelligent experience as a background. That worked both ways. The 
practical man had to go and talk to his Professors partly in order that the 
Professor himself should share his experience. In actual fact all statistical 
methods were strictly inapplicable to practical affairs; they all depended upon 
random samples and, as everyone knew, there were no such things. That, of 
course, was an exaggeration; there were two random phenomena, one of which was 
the disintegration of radioactive elements, and the other was Tippett’s numbers. 
The whole art of statistical inference lay in the reconciliation of random mathe¬ 
matics with biased samples. Every new problem had some fresh kind of bias 
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and might contain some new pitfall. The only way not to fall into these pitfalls 
was to talk over the problem with some intelligent critic; and so the practical 
man, if he were not entirely foolish, talked over his problems with the Professor, 
and the Professor would not consider himself to be a competent critic unless 
he had had some experience of applying the statistics to industry, and had 
learned the difficulties of that application. 

(ii) 

[Supplement to J. Roy . Statist. Soc. (1936), in, p. 173] 

Mr Gosset said he would like to confirm a remark of Prof. Pearson’s about 
the difficulty of working with large-scale results. In most cases the whole object 
—or one of the principal objects—of manufacture is to keep the product as 
uniform as possible. In addition to that and in order to obtain that, it is 
necessary to keep the raw materials as constant as possible; consequently, when 
one looks at large-scale results, there is no variation to work upon, and the 
statisticians are helpless, at any rate until something has gone wrong. 

Mr Gosset said that up to the present he had been interested in spectacle 
glass only as a consumer, and his excuse for intervening in this discussion was 
that he could illustrate the use of a simple statistical method on the tables 
which were given at the end of the paper. 

In an investigation such as this, where one wished to throw light on the 
behaviour of a large-scale process, the method of correlation was very often 
useful, but at first sight the tables did not look very promising, split up as they 
were into very small samples, both by the small numbers of journeys per pot 
and the different kinds of glass. In this connexion he would say to Mr Jennett 
that there were two uses in correlation, one was the use of the regression line, 
and that was doubtless the best, and the other its use merely as a measure of 
the relation between the two things. 

There was a method of correlation used largely by psychologists, known as 
“Spearman’s method”; it was not an efficient method—that is, it did not utilize 
all the information supplied by the samples, so that about 20 % larger samples 
must be collected to give as accurate a result as the ordinary correlation coeffi¬ 
cient, yet, owing to an artful method of calculation, it was so simple that when 
playing with other people’s figures, for instance on a railway journey, it was the 
obvious one to use. It consisted of replacing each variate by the figure repre¬ 
senting its numerical order, and correlating these numbers. 

By this method, Mr Gosset said he had obtained weighted average correlation 
coefficients between the number of veins and the order of the journey, which put 
it beyond all question that the later the journey the worse the veins. This 
weighted average was derived from all the 98 samples discoverable in the tables: 
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the mean size of sample was just over 4, the greatest was 8 and the smallest 2. 
The average results were as follows: 


For A 

031 

2-6 

„ B 

0-27 

3*1 

„ C 

0*44 

2*2 

„ D 

Oil 

08 

» E 

019 

1*3 

Total 

026 

4*5 


times its standard deviation 

99 99 99 

99 99 99 

99 99 99 

99 >5 


A, B , and C were all significant. D and E were not so, but there was no evidence 
that any glass behaved differently from the others. When he said “standard 
deviation” it was calculated on the supposition that there was no correlation 
at all. It meant the standard deviation of correlation coefficients of samples 
of the appropriate degrees of freedom drawn from uncorrelated material, and 
the mean 0*26 corresponded to a correlation coefficient of about 0-30 if large 
samples had been obtainable. 

This did not confirm the authors’ conclusion, and Mr Gosset could offer no 
opinion as to the disagreement unless it was the custom to stop using, at an 
. early stage, pots which had given poor results. A similar investigation into seeds 
showed that there was no evidence of correlation between seeds and order of 
journey except in the case of glass A, where the correlation was 0*27, 2*3 times 
the standard deviation. 

He had also tested the correlation between refractive index and both seeds 
and veins, the former without any success, but there was a distinct indication 
that the higher the refractive index, the worse the veins; perhaps the veins 
themselves had a low refractive index. The evidence was not significant, since 
the correlation coefficient 0*16 was but 1*6 times its standard deviation, but if 
the matter was of any importance, this might give a line for further investigation. 

Mr Gosset again expressed his great interest in the subject-matter of the 
paper. 


(iii) 

[Supplement to J. Roy. Statist. Soc. (1937), iv, p. 89] 

Mr Gosset wished to say a word for the control chart. It had been talked 
about as a sort of wall ornament, but in point of fact it was a very useful thing. 
He had had control charts in the laboratory which had led up to nearly halving 
a laboratory error, because they gave a hint as to what to look for. 

And in this discussion, although the method of testing the strength had been 
aspersed, it was clear from the control chart that the method was good enough 
to show secular changes, unless indeed, as was unlikely, the secular changes 
were due to the testing machine itself. 
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(iv) 

[Supplement to J. Roy. Statist. Soc. (1937), iv, p. 170] 

On reading Mr Bartlett’s paper, I saw that I could add little or nothing 
to his treatment of the statistical principles involved, hut it occurred to 
me that other people besides myself might have had their curiosity aroused by 
certain matters of less interest perhaps statistically but yet of some practical 
importance. I refer of course to the results of the experiments. I therefore 
wrote to Mr Bartlett, who very kindly sent me his copies of the four papers in 
the list of references, with which Dr Crowther’s name is associated, and I am 
going to give an account, necessarily inadequate, of the fine piece of work which 
they describe. 

The four papers deal primarily with the cotton crop in Egypt, particularly 
in the Delta, Cotton is grown in Egypt as an annual and not, as might be 
expected, as a perennial, because of the pests by which it is afflicted, especially 
the Pink Boll Worm. This has so much increased of late years that the methods 
of cultivation have during the last ten years been modified throughout the 
country. At the same time, new varieties have been introduced, the tendency 
being to produce larger yields of cotton of shorter staple. That being so, it 
became necessary to examine how far these changes have altered the old 
standards of manuring and, particularly, what profit was to be derived from 
nitrogenous manures. 

The experiments directed by Dr Crowther were concerned mainly with the 
elucidation of this question and, as you have heard from Mr Bartlett, were 
carried out at several stations, where the effects of various levels of nitrogenous 
manuring were compared under different conditions of spacing, watering and 
phosphate manuring, and with different varieties of cotton. 

The actual gain from the use of nitrogen varied with the spacing adopted, 
with the different varieties and, naturally enough, between the different 
stations, but the average profit from the use of nitrogenous manures was over 
£3 per acre, and at only one out of eight stations was the profit not appreciable. 
Had the optimum quantity of nitrogen been used, the gain would have been 
considerably more. Furthermore, an experiment with wheat following cotton 
at a single station showed that in that case the increased yield of wheat more 
than paid for the nitrogen applied to the cotton. I think that is a very good 
instance of what large gain can be made: £3 an acre on all the cotton of Egypt 
would produce an enormous amount of money. 

These results may not seem to be very surprising until you learn (a) that 
previously it was generally believed that nitrogen Was of little or no value to 
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the cotton crop, and (b) that in Egypt nitrogenous residues were supposed to he 
leached out by the irrigation water. 

An investigation into the relation between the supply of nitrogen and the 
development of cotton leads Dr Crowther to the opinion that it is largely owing 
to the closer spacing of modern practice that the plant can make good use of 
added nitrogen, but I should like to ask whether the substitution of the modern 
nitro-chalk for nitrate of soda or ammonium sulphate may not also have had 
a beneficial effect of its own. 

I have now much pleasure in moving a very hearty vote of thanks to Mr 
Bartlett for his paper, and if I have rather strayed from the straight and narrow 
path which he has himself followed, I have done so in the confident expectation 
that my lapse will be atoned for by the speakers who will follow. 
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